Project analysys the weather on San Francisco Bay Area region in California, exactly for cities like San Francisco, San Mateo, Santa Clara, Mountain View and San Jose. Data cleaning, manipulation and data transformation was done with use of Pandas - powerful Python data analysis toolkit. Addionaly there are many visualization, where some of them were prepared with matplotlib and seaborn library. This project will introduce us to the basics of Pandas concept such as:
- data frames
- manipulation and data transformation
- data cleaning
- draw conclusions etc
Why Pandas and Seaborn?
- You can easily pass Pandas Data Frame to Seaborn
- Plot data from interesting columns or rows
Sample plots:
Simple Youtube presentation what type of visualization is generated:
For further analysis, parameters were choosen:
- temperature [F]
- humidity [%]
- pressure [inHg]
- wind speed [MPH]
- gust speed [MPH]
- cloud level [0-10]
- visibility [%]
- events such as rain, fog, thunderstorm
You will learn:
- How to read CSV files into Pandas Data Frame
- How to clean the data, remove missing values, remove unused columns, replace names etc.
- How to create plots, histograms and heat maps based on Pandas Data Frame
The project contains two file, first contains raw CSV data taken from U.S. Government's open data website. The second file is Python script with all the pandas and seaborn code:
- weather.csv - data file, generated from U.S. Government's open data website
- main.py - main file with analysis and plots
- Official Pandas Documentation (You can also download it in PDF version)
- Femi Anthony "Mastering Pandas"
- Michael Heydt "Learning Pandas"
- You can download code from GitHub
- You can run the project in your browser