In [4]:
import pandas as pd

# Pandas DataFrames - Theory, Examples, and Implementations

## 1. What is pandas?
- **Theory:** pandas is a Python library for data analysis and manipulation. It provides powerful data structures like Series (1D) and DataFrame (2D).
- **Use Case:** Data cleaning, analysis, and visualization in Python.

---

## 2. Importing pandas
- **Theory:** Importing pandas is the first step to use its features.
- **Example:**
  ```python
  import pandas as pd
  ```
- **Implementation:** All pandas functions are accessed with the `pd` alias.

---

## 3. Loading Data from CSV
- **Theory:** Data is often stored in CSV files. pandas can read these files into DataFrames.
- **Example:**
  ```python
  df = pd.read_csv('weather_data.csv')
  df
  ```
- **Implementation:** Use `pd.read_csv()` to load data for analysis.

---

## 4. Creating DataFrames from Dictionaries
- **Theory:** You can create DataFrames from Python dictionaries for small datasets or testing.
- **Example:**
  ```python
  data = {
      'student': ["Amit", "John", "Jacob", "David", "Steve"],
      'rank': [1, 2, 3, 4, 5],
      'marks': [95, 70, 80, 60, 90]
  }
  df_dict = pd.DataFrame(data)
  print('student records \n \n', df_dict)
  ```
- **Implementation:** Use `pd.DataFrame()` to convert a dictionary to a DataFrame.

---

## 5. Exploring Data
- **Theory:** It's important to understand the structure and contents of your data.
- **Examples:**
  ```python
  df.shape         # (rows, columns)
  df.head()        # First 5 rows
  df.head(2)       # First 2 rows
  df.tail()        # Last 5 rows
  df.tail(2)       # Last 2 rows
  ```
- **Implementation:** Use these methods to quickly inspect your data.

---

## 6. Indexing and Selecting Data
- **Theory:** Select specific rows and columns for analysis.
- **Examples:**
  ```python
  df[1:4]                  # Rows 1 to 3
  df['day']                # Access 'day' column
  df.day                   # Alternative column access
  df[['event', 'temperature']]  # Multiple columns
  ```
- **Implementation:** Use slicing and column selection to focus on relevant data.

---

## 7. Data Types
- **Theory:** Each column in a DataFrame is a Series, which has a data type.
- **Example:**
  ```python
  type(df['event'])
  ```
- **Implementation:** Use `type()` to check column types.

---

## 8. Data Operations
- **Theory:** pandas provides functions for statistical analysis.
- **Examples:**
  ```python
  df['temperature'].max()   # Maximum value
  df['temperature'].mean()  # Mean value
  df['temperature'].min()   # Minimum value
  df['temperature'].std()   # Standard deviation
  df.describe()             # Summary statistics
  ```
- **Implementation:** Use these methods for quick data insights.

---

## 9. Filtering Data
- **Theory:** Extract rows that meet certain conditions.
- **Examples:**
  ```python
  df[df['temperature'] >= 30]
  df[df['temperature'] == df['temperature'].max()]
  df[['temperature', 'day']][df.temperature >= 30]
  ```
- **Implementation:** Use boolean indexing to filter data.

---

## 10. Index Manipulation
- **Theory:** The index is used to label rows. You can set or reset it as needed.
- **Examples:**
  ```python
  df.set_index('day', inplace=True)   # Set 'day' as index
  df.loc['1/1/2017']                  # Access row by index label
  df.reset_index(inplace=True)        # Reset to default integer index
  ```
- **Implementation:** Use `set_index`, `reset_index`, and `loc` for advanced row access.

---

## 11. General Tips
- Use `print(df)` or just `df` to view your DataFrame.
- Use `.info()` and `.describe()` to understand your data.
- Practice with small examples to build confidence.

---

These notes provide theory, examples, and code implementations for all the pandas DataFrame concepts you practiced in this notebook. Use them as a reference for both understanding and applying pandas in your projects!

In [16]:
df = pd.read_csv('weather_data.csv')
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [4]:
# create a pandas dataframe

# dataset
data = {

    'student': ["Amit", "John", "Jacob", "David", "Steve"],
    'rank': [1, 2, 3, 4, 5],
    'marks': [95, 70, 80, 60, 90]
}

df_dict = pd.DataFrame(data)
print('student records \n \n',df_dict)

student records 
 
   student  rank  marks
0    Amit     1     95
1    John     2     70
2   Jacob     3     80
3   David     4     60
4   Steve     5     90


In [12]:
df_shape = df.shape
print(df_shape)

dfhead = df.head()
print(dfhead)

df_sp_head = df.head(2)
print(df_sp_head)

dftail = df.tail()
print(dftail)

df_sp_tail = df.tail(2)
print(df_sp_tail)

(6, 4)
        day  temperature  windspeed  event
0  1/1/2017           32          6   Rain
1  1/2/2017           35          7  Sunny
2  1/3/2017           28          2   Snow
3  1/4/2017           24          7   Snow
4  1/5/2017           32          4   Rain
        day  temperature  windspeed  event
0  1/1/2017           32          6   Rain
1  1/2/2017           35          7  Sunny
        day  temperature  windspeed  event
1  1/2/2017           35          7  Sunny
2  1/3/2017           28          2   Snow
3  1/4/2017           24          7   Snow
4  1/5/2017           32          4   Rain
5  1/6/2017           31          2  Sunny
        day  temperature  windspeed  event
4  1/5/2017           32          4   Rain
5  1/6/2017           31          2  Sunny


In [15]:
# indexing in pandas
print(df[1:4])

print(df.columns)

print(df.day)
print(df['day'])

        day  temperature  windspeed  event
1  1/2/2017           35          7  Sunny
2  1/3/2017           28          2   Snow
3  1/4/2017           24          7   Snow
Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')
0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object
0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object


In [21]:
print(type(df['event']))

df[['event','temperature']]

<class 'pandas.core.series.Series'>


Unnamed: 0,event,temperature
0,Rain,32
1,Sunny,35
2,Snow,28
3,Snow,24
4,Rain,32
5,Sunny,31


In [25]:
# operations in pandas
# max
print(df['temperature'].max())

# average
print(df['temperature'].mean())

# min
print(df['temperature'].min())

# standard deviation
print(df['temperature'].std())

35
30.333333333333332
24
3.8297084310253524


In [26]:
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


In [7]:
print(df[df['temperature'] >= 30]) # df[df.tempreature >= 30]

print(df[df['temperature'] == df['temperature'].max()])

print(df[['temperature','day']][df.temperature >= 30])

        day  temperature  windspeed  event
0  1/1/2017           32          6   Rain
1  1/2/2017           35          7  Sunny
4  1/5/2017           32          4   Rain
5  1/6/2017           31          2  Sunny
        day  temperature  windspeed  event
1  1/2/2017           35          7  Sunny
   temperature       day
0           32  1/1/2017
1           35  1/2/2017
4           32  1/5/2017
5           31  1/6/2017


In [None]:
df.index

df.set_index('day')


df.set_index('day', inplace=True)

In [19]:
print(df)

df.loc['1/1/2017']

          temperature  windspeed  event
day                                    
1/1/2017           32          6   Rain
1/2/2017           35          7  Sunny
1/3/2017           28          2   Snow
1/4/2017           24          7   Snow
1/5/2017           32          4   Rain
1/6/2017           31          2  Sunny


temperature      32
windspeed         6
event          Rain
Name: 1/1/2017, dtype: object

In [20]:
df.reset_index(inplace=True)

print(df)

        day  temperature  windspeed  event
0  1/1/2017           32          6   Rain
1  1/2/2017           35          7  Sunny
2  1/3/2017           28          2   Snow
3  1/4/2017           24          7   Snow
4  1/5/2017           32          4   Rain
5  1/6/2017           31          2  Sunny
