#  Pandas Dataframe
Pandas DataFrame is a two-dimensional labeled data structure, where the columns can have different data types, such as integers, floats, and strings. It is one of the most popular data structures used in data analysis and machine learning tasks.

In this course, we will cover the basics of using Pandas DataFrame for data analysis. We will start with an introduction to Pandas DataFrame and then move on to topics such as data manipulation, data cleaning, and data visualization.

In [1]:
import pandas as pd

## 01. Creating a Pandas DataFrame

### 01. Creating DataFrame from Dictionary

In [2]:
weather_dict = {
    "day": ["1/1/2020", "1/2/2020", "1/3/2020", "1/4/2020", "1/5/2020", "1/6/2020"],
    "temperature": [32, 35, 28, 24, 32, 31],
    "windspeed": [6, 7, 2, 7, 4, 2],
    "event": ["Rain", "Sunny", "Snow", "Snow", "Rain", "Sunny"]
}

In [3]:
# Creating dataframe from dictionary
df1 = pd.DataFrame(weather_dict)
df1

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,32,6,Rain
1,1/2/2020,35,7,Sunny
2,1/3/2020,28,2,Snow
3,1/4/2020,24,7,Snow
4,1/5/2020,32,4,Rain
5,1/6/2020,31,2,Sunny


### 02. Creating DataFrame from CSV

In [4]:
path = "D:\Coding\Git Repository\Data-Science-Bootcamp-with-Python\Datasets\sample_weather_data.csv"
df2 = pd.read_csv(path)
df2

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny


In [5]:
# Print the type of the df2 variable
type(df2)

pandas.core.frame.DataFrame

In [6]:
# Printing the shape of the dataframe
rows, colums = df2.shape
rows

6

## 02. head() Method
In Pandas, the head() method is used to view the first few rows of a DataFrame. By default, it displays the first 5 rows of the DataFrame. This method is useful to get a quick overview of the data in the DataFrame.

In [7]:
# Print first five rows of the dataframe
df2.head()

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain


In [8]:
# Print the first two rows of the dataframe
df2.head(2)

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny


## 03. tail() Method
In Pandas, the tail() method is used to view the last few rows of a DataFrame. By default, it displays the last 5 rows of the DataFrame. This method is useful to get a quick overview of the data in the DataFrame.

In [9]:
# Print last five rows of dataframe
df2.tail()

Unnamed: 0,day,temperature,windspeed,event
1,01-02-2020,35,7,Sunny
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny


In [10]:
# Print the last two rows of the dataframe
df2.tail(2)

Unnamed: 0,day,temperature,windspeed,event
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny


## 04. Indexing and Slicing in DataFrame

In [11]:
# Print row number 2 to 4
df2[2:5]

Unnamed: 0,day,temperature,windspeed,event
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain


In [12]:
# Print the names of the columns
df2.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [13]:
# Print individual column of the dataframe
df2.day

0    01-01-2020
1    01-02-2020
2    01-03-2020
3    01-04-2020
4    01-05-2020
5    01-06-2020
Name: day, dtype: object

In [14]:
# Print the type of a column
type(df2["event"])

pandas.core.series.Series

In [15]:
# Print specific columns from dataframe
df2[["day", "temperature", "event"]]

Unnamed: 0,day,temperature,event
0,01-01-2020,32,Rain
1,01-02-2020,35,Sunny
2,01-03-2020,28,Snow
3,01-04-2020,24,Snow
4,01-05-2020,32,Rain
5,01-06-2020,32,Sunny


## 05. Operations with DataFrame

### 01. max() Method

In [16]:
# Print the maximum temperature
df2["temperature"].max()

35

### 02. min() Method

In [17]:
# Print the minimum temperature
df2["temperature"].min()

24

### 03. mean() Method

In [18]:
# Print the mean (average) of the temperature
df2["temperature"].mean()

30.5

### 04. std() Method

In [19]:
# Print the standard deviation of the temperature
df2["temperature"].std()

3.8858718455450894

### 05. describe() Method

In [20]:
# Print the statistics of the whole dataframe
df2.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.5,4.666667
std,3.885872,2.33809
min,24.0,2.0
25%,29.0,2.5
50%,32.0,5.0
75%,32.0,6.75
max,35.0,7.0


## 06. Conditional Selection in DataFrame

In [21]:
# Print all the rows where temperature greater than or equal to 30
df2[df2.temperature >= 30]

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny


In [22]:
# Print the row where temperature is maximum
df2[df2.temperature == df2["temperature"].max()]

Unnamed: 0,day,temperature,windspeed,event
1,01-02-2020,35,7,Sunny


In [23]:
# Print only the day and temperature column where the temperature is maximum
df2[["day", "temperature"]][df2.temperature == df2["temperature"].max()]

Unnamed: 0,day,temperature
1,01-02-2020,35


## 07. set_index() Method
In Pandas, the set_index() method is used to set one or more columns as the index of a DataFrame. This method returns a new DataFrame with the specified column(s) set as the index.

In [24]:
df2

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny


In [25]:
df2.index

RangeIndex(start=0, stop=6, step=1)

In [26]:
# Set the 'day' column as the index of the dataframe
df2.set_index("day", inplace=True)

In [27]:
df2

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01-01-2020,32,6,Rain
01-02-2020,35,7,Sunny
01-03-2020,28,2,Snow
01-04-2020,24,7,Snow
01-05-2020,32,4,Rain
01-06-2020,32,2,Sunny


In [28]:
# The 'loc' function is used to access a group of rows and columns by label(s) or a boolean array.
df2.loc["01-01-2020"]

temperature      32
windspeed         6
event          Rain
Name: 01-01-2020, dtype: object

## 08. reset_index() Method
In Pandas, the reset_index() method is used to reset the index of a DataFrame to a default numbered index. It is often used to reset the index after setting it to a column or multiple columns using the set_index() method.

In [29]:
df2.reset_index(inplace=True)

In [30]:
df2

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32,6,Rain
1,01-02-2020,35,7,Sunny
2,01-03-2020,28,2,Snow
3,01-04-2020,24,7,Snow
4,01-05-2020,32,4,Rain
5,01-06-2020,32,2,Sunny
