banglanews

This is a python package to get the textual content from online Bengali newspapers. The objective of this package is to get Bengali text mainly for the data scientists who need Bengali textual content for research purpose. This package was created for academic AND non-commercial use ONLY. The author of this package does not encourage OR suggest to use this application for anything other than academic research or experimental works.

How to install

The easiest way to install it from a Python 3 environment using the following command:

pip install banglanews

Alternatively, you can download the package banglanews-0.0.2.tar.gz and put it in a directory. From any Python 3 environment open a terminal, go to that directory where you put the package and install the package using the following command:

pip install banglanews-0.0.2.tar.gz

How to use it in the code

Right now the package only supports the leading Bengali newspaper 'Prothom Alo'. Include the module in your code like below:

from banglanews import prothomalo

Initializing the class

The package contains one single class named scraper. This is how you need to initialize it:

objScraper = prothomalo.scraper('2021-12-01','2021-12-05','D:\\Content')

1st argument: start_date = The date you want the scraper to start with.
2nd argument: end_date = the end date of getting the content.
3rd argument: output_dir = The file system location where you want to dump the content.

Please note, all three arguments are mandatory. Also note, the dates must be in the YYYY-MM-DD format.

Important methods:

PrintContents : Prints individual articles in text files in the file system
PrintURLs : Prints the Headlines and URLs in a single pipe delimitted csv file in the file system
PrintComments : Prints the Headlines, URLS and Comments in a single pipe delimitted csv file in the file system

A sample call

1st argument (optional): search_text = The text that you want to be searched. If you do not pass any value, all the URLs in between start_date and end_date will be searched.

objScraper.PrintURLs('করোনা')

The above call will create a .csv file in the output_dir location provided in the class initialization.

Version information

Latest version: 0.0.2
Previous versions:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
banglanews-0.0.2.tar.gz		banglanews-0.0.2.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

banglanews-0.0.2.tar.gz

banglanews-0.0.2.tar.gz

Repository files navigation

banglanews

How to install

How to use it in the code

Initializing the class

Important methods:

A sample call

Version information

About

Releases 1

Packages

License

neolithian/banglanews

Folders and files

Latest commit

History

Repository files navigation

banglanews

How to install

How to use it in the code

Initializing the class

Important methods:

A sample call

Version information

About

Resources

License

Stars

Watchers

Forks