Skip to content

pranav-gupta-7/Web-Scraper-And-Classifier-For-HIV-Articles-

Repository files navigation

Web-Scraper-And-Classifier-For-HIV-Articles

Categories used for classification-

1- Support/Facilitation/Awareness
2- Research/Development/Biomedical Improvement
3- Society Discrimination/Negative Impact
4- Patient’s progress/Infected improved life
5- Surge/HIV positive cases
6- HIV negative cases
7- Accident/Death cases
8- Suicide Cases
0- other

Requirrements-

1-Python
2-NLTK
3-Keras
4-Matplotlib (for visualization)
5-Spacy (for extracting places names from the articles)

File Structure

 .
└── Web-Scraper-And-Classifier-For-HIV-Articles-
   ├── Classifier.ipynb
   ├── Data
   │   └── final.xlsx
   ├── hiv_article_dataset_creator.py
   ├── LICENSE
   ├── news_articles_url_scrapper.py
   ├── README.md
   └── visualization
       ├── all.png
       ├── death.png
       ├── matrimony.png
       ├── pie.png
       ├── suicide.png
       └── visualization.py

How to use the repo

  1. Firstly run news_articles_url_scrapper.py . It will scrap all the articles from a given date to another and will dump all the urls in a CSV file named all_articles.csv

  2. Then run hiv_article_dataset_creator.py. It will scrap all the HIV articles from the all_articles.csv fille and a create hiv_report_data.xlsx file containing columns Year,Heading and Content of the HIV article.

  3. Running the Classifier.ipynb will classifiy the HIV articles in the above given categories.

  4. Run visualization.py if you want to get the visualized report on (death cases,suicide cases,matrimony related articles and the places mentioned in surge/epidimic category).