Real-world data rarely comes clean. In this repository, I used Python and its libraries, to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.
The real world, I have assessed here is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 6 million followers and has received international media coverage.
I have gathered data from three different sources which I have mentioned in the jupyter notebook of this repository.
You need to work in a Jupyter Notebook on your computer.
The following packages (libraries) need to be installed. You can install these packages via conda or pip.
- pandas
- NumPy
- requests
- tweepy
- json
- matplotlib
- All project code is contained in a Jupyter Notebook named wrangle_act.ipynb and runs without errors.
- The Jupyter Notebook has an intuitive, easy-to-follow logical structure. The code uses comments effectively and is interspersed with Jupyter Notebook Markdown cells. The steps of the data wrangling process (i.e. gather, assess, and clean) are clearly identified with comments or Markdown cells, as well.
Feel free to suggest any changes or contribute to the project.
This project is covered under MIT License.