Skip to content

Real-world data rarely comes clean. In this repository, I will be using Python and its libraries, to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.

License

Notifications You must be signed in to change notification settings

somya1212/Wrangle-and-Analyze-Twitter-Archive

Repository files navigation

Wrangle-and-Analyze-Twitter-Archive

Real-world data rarely comes clean. In this repository, I used Python and its libraries, to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.

Dataset :

The real world, I have assessed here is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 6 million followers and has received international media coverage.

Sources :

I have gathered data from three different sources which I have mentioned in the jupyter notebook of this repository.

Requirements:

You need to work in a Jupyter Notebook on your computer.

The following packages (libraries) need to be installed. You can install these packages via conda or pip.

  • pandas
  • NumPy
  • requests
  • tweepy
  • json
  • matplotlib

Code Functionality :

  • All project code is contained in a Jupyter Notebook named wrangle_act.ipynb and runs without errors.
  • The Jupyter Notebook has an intuitive, easy-to-follow logical structure. The code uses comments effectively and is interspersed with Jupyter Notebook Markdown cells. The steps of the data wrangling process (i.e. gather, assess, and clean) are clearly identified with comments or Markdown cells, as well.

Contribution :

Feel free to suggest any changes or contribute to the project.

License :

This project is covered under MIT License.

About

Real-world data rarely comes clean. In this repository, I will be using Python and its libraries, to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages