In [5]:
import pandas as pd

# Pandas DataFrames - Learning Notes

## What is Pandas?
- Pandas is a Python library for data manipulation and analysis.
- It provides two main data structures: **Series** (1D) and **DataFrame** (2D).

## Key Concepts Covered
1. **Importing pandas**: Use `import pandas as pd`.
2. **Creating DataFrames**: You can create a DataFrame from a dictionary.
3. **Reading CSV files**: Use `pd.read_csv('filename.csv')` to load data.
4. **Data Analysis Methods**:
   - `.max()`: Find the maximum value in a column.
   - `.mean()`: Calculate the average value in a column.
   - Filtering: Select rows based on conditions.
5. **Data Cleaning (Munging)**:
   - Use `.fillna(0, inplace=True)` to replace missing values with 0.

## Example Workflow
- Import pandas
- Create or load a DataFrame
- Analyze data (max, mean, filter)
- Clean data (handle missing values)

---
Use these notes as a quick reference for your pandas DataFrame operations!

In [12]:
retrive_csv = pd.read_csv('nyc_weather.csv')
print(retrive_csv)

                                                                                                Unnamed: 0  \
EST       Temperature DewPoint Humidity Sea Level PressureIn VisibilityMiles WindSpeedMPH  PrecipitationIn   
1/1/2016  38          23       52       30.03                10              8                           0   
1/2/2016  36          18       46       30.02                10              7                           0   
1/3/2016  40          21       47       29.86                10              8                           0   
1/4/2016  25          9        44       30.05                10              9                           0   
1/5/2016  20          -3       41       30.57                10              5                           0   
1/6/2016  33          4        35       30.5                 10              4                           0   
1/7/2016  39          11       33       30.28                10              2                           0   
1/8/2016  

In [4]:
# max method
max_temp = retrive_csv['Temperature'].max()
print(max_temp)

# avg method
avg_wind = retrive_csv['WindSpeedMPH'].mean()
print(avg_wind)

dates_rain = retrive_csv['EST'][retrive_csv['Events'] == 'Rain']
print(dates_rain)


50
6.892857142857143
8      1/9/2016
9     1/10/2016
15    1/16/2016
26    1/27/2016
Name: EST, dtype: object


In [12]:
# data munging or data cleaning

retrive_csv.fillna(0, inplace=True)
print(retrive_csv)
avg_wind = retrive_csv['WindSpeedMPH'].mean()
print(avg_wind)

          EST  Temperature  DewPoint  Humidity  Sea Level PressureIn  \
0    1/1/2016           38        23        52                 30.03   
1    1/2/2016           36        18        46                 30.02   
2    1/3/2016           40        21        47                 29.86   
3    1/4/2016           25         9        44                 30.05   
4    1/5/2016           20        -3        41                 30.57   
5    1/6/2016           33         4        35                 30.50   
6    1/7/2016           39        11        33                 30.28   
7    1/8/2016           39        29        64                 30.20   
8    1/9/2016           44        38        77                 30.16   
9   1/10/2016           50        46        71                 29.59   
10  1/11/2016           33         8        37                 29.92   
11  1/12/2016           35        15        53                 29.85   
12  1/13/2016           26         4        42                 2