**In this notebook, we will learn:**
* **What is DataFrame?**
* **Different ways of creating DataFrame?**
* **Operations on DataFrame**

# What is DataFrame?

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

## Different Ways of Creating DataFrame

* **Using CSV file**

In [1]:
import pandas as pd

df = pd.read_csv("Practice Files/weather_data.csv")
df #DataFrame

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


* **Using Excel file**

In [2]:
# excel file name and sheet name
df = pd.read_excel("Practice Files/weather_data.xlsx","weather_data")
df #DataFrame

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32,6,Rain
1,2017-01-02,35,7,Sunny
2,2017-01-03,28,2,Snow
3,2017-01-04,24,7,Snow
4,2017-01-05,32,4,Rain
5,2017-01-06,31,2,Sunny


* **Using Dictionary**

In [9]:
weather_data = {
    'day': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],
    'temperature': [32,35,28,24,32,31],
    'windspeed': [6,7,2,7,4,2],
    'event': ['Rain', 'Sunny', 'Snow','Snow','Rain', 'Sunny']
}
df1 = pd.DataFrame(weather_data) # Reading from a dictionary
df1

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


* **Using List of Tuples**

In [80]:
list = [
    ("1/1/2017",32,6,"Rain"),
    ("1/2/2017",35,7,"Sunny"),
    ("1/3/2017",28,2,"Snow"),
]
df2 = pd.DataFrame(list,columns=["Day","temperature","windspeed","event"])
df2

Unnamed: 0,Day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


* **Using List of Dictionaries**

In [81]:
list = [
    {"day":"1/1/2017","temperature":32,"windspeed":6,"event":"Rain"},
    {"day":"1/2/2017","temperature":35,"windspeed":7,"event":"Sunny"},
    {"day":"1/3/2017","temperature":28,"windspeed":2,"event":"Snow"},
]

df3 = pd.DataFrame(list)
df3

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


# Operations On DataFrame

In [28]:
print(type(df))
print(type(df.event))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


In [3]:
df.describe()   #provides all the calculative details about the data

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


In [4]:
# if we want to print these data specifically,then
print("Max Temperature = ",df['temperature'].max())
print("Average Windspeed = ",df['windspeed'].mean())

Max Temperature =  35
Average Windspeed =  4.666666666666667


In [5]:
print("Columns = ",df.columns)

Columns =  Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')


In [22]:
df['day']     # df.day

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

In [8]:
rows,columns = df.shape
rows,columns

(6, 4)

In [10]:
df.head(2) # Starting 2 Rows

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


In [11]:
df.tail(2) # Ending 2 Rows

Unnamed: 0,day,temperature,windspeed,event
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [18]:
# Indexing and Slicing in DataFrame
print(df.index)
df[1:4] 

RangeIndex(start=0, stop=6, step=1)


Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow


In [20]:
# Conditional Select the data in DataFrame
df['day'][df.temperature==df['temperature'].max()]

1    1/2/2017
Name: day, dtype: object

In [62]:
df.set_index('event',inplace=True)

In [63]:
df

Unnamed: 0_level_0,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Rain,1/1/2017,32,6
Sunny,1/2/2017,35,7
Snow,1/3/2017,28,2
Snow,1/4/2017,24,7
Rain,1/5/2017,32,4
Sunny,1/6/2017,31,2


In [64]:
df.loc['Snow']

Unnamed: 0_level_0,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Snow,1/3/2017,28,2
Snow,1/4/2017,24,7


In [65]:
df.reset_index(inplace=True)

In [67]:
df

Unnamed: 0,event,day,temperature,windspeed
0,Rain,1/1/2017,32,6
1,Sunny,1/2/2017,35,7
2,Snow,1/3/2017,28,2
3,Snow,1/4/2017,24,7
4,Rain,1/5/2017,32,4
5,Sunny,1/6/2017,31,2


In [70]:
df.set_index('day',inplace=True)

In [71]:
df

Unnamed: 0_level_0,event,temperature,windspeed
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,Rain,32,6
1/2/2017,Sunny,35,7
1/3/2017,Snow,28,2
1/4/2017,Snow,24,7
1/5/2017,Rain,32,4
1/6/2017,Sunny,31,2


In [72]:
df.loc['1/3/2017']

event          Snow
temperature      28
windspeed         2
Name: 1/3/2017, dtype: object

In [73]:
df.reset_index(inplace=True)
df

Unnamed: 0,day,event,temperature,windspeed
0,1/1/2017,Rain,32,6
1,1/2/2017,Sunny,35,7
2,1/3/2017,Snow,28,2
3,1/4/2017,Snow,24,7
4,1/5/2017,Rain,32,4
5,1/6/2017,Sunny,31,2
