This case study investigates the web and social media content from three popular alternative health websites. All articles from 2020 were scraped from the three websites (mercola.com, childrenshealthdefense.org, and greendmedinfo.com) using rselenium and rvest. Twitter data was collected from Massmine and Facebook data was collected from CrowdTangle. All posts and text files were cleaned and text and sentiment analysis was performed on all content for comparison. Web traffic data was also acquired from Semrush and was analyzed for all three websites. Code and analysis is found in the Rmarkdown file and the accompanying functions file. Sample scrape files for the three websites are included. and a sample file for text cleaning is also attached. Additional scraping and data cleaning was performed for the datasets used in the final report. The PDF report shows the output of the rmarkdown with some revisions. Due to the size of the datafiles, these files are not posted but may be shared by request.
jeannereppert/Capstone-Project-Misinformation-and-Covid-19-in-Health-Websites
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A text and sentiment analysis case study of web content, tweets and Facebook posts for three popular alternative health websites. Additionally web traffic analytics data from SemRush is investigated for the three websites.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published