### What is Pandas?
- [Pandas](https://pandas.pydata.org/docs/getting_started/index.html) is an open-source Python library providing high-performance, powerful and widely-used library for data manipulation and analysis.
- It provides two main data structures: Series, which is a one-dimensional labeled array, and DataFrame, which is a two-dimensional table with labeled axes (rows and columns).
- These structures allow for efficient handling and analysis of structured data, such as CSV files, Excel sheets, and SQL databases.
- [Pandas](https://www.geeksforgeeks.org/introduction-to-pandas-in-python/) simplifies tasks such as data cleaning, filtering, aggregation, merging datasets, and handling time series data. 
- It's highly optimized and integrates well with other Python libraries, making it a go-to tool for data scientists and analysts.

#

### Why Use Pandas?
- Ease of Use: Simplifies data handling with intuitive structures like Series and DataFrames.
- Efficient Data Manipulation: Provides powerful functions for filtering, aggregating, and transforming data with minimal code.
- Handling Missing Data: Built-in methods for detecting and handling missing values, ensuring clean datasets.
- Integration with Other Libraries: Seamlessly works with NumPy, Matplotlib, Scikit-learn, and other Python libraries.
- Performance: Fast operations on large datasets due to its NumPy foundation and support for vectorized operations.
- Rich I/O Capabilities: Supports various data formats like CSV, Excel, SQL, and JSON.
- Community Support: Extensive documentation and an active community provide ample resources for learning and troubleshooting.

#

### Installation
To install Pandas, you can use pip:

In [3]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


#

In [5]:
# import pandas
import pandas as pd

In [5]:
# Checking Pandas Version
print(pd.__version__)

2.2.2


#

### Series in Pandas
- A Pandas Series is like a column in a table.
- It is a one-dimensional array holding data of any type.

#### Creating a Series

In [6]:
# Creating a Series from a list
list_a = [1, 2, 3, 4, 5]

s = pd.Series(list_a)
print(s)
print(type(s))

0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


##### Labels
- If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.
- This label can be used to access a specified value.

In [7]:
# label can be used to access a specified value.
print(s[0])

1


In [8]:
# Creating a Series with custom index
si = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(si)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [9]:
# accessing the series values using customer index lables
print(si['c'])
print(si['c':])

3
c    3
d    4
e    5
dtype: int64


# 

### DataFrame in Panadas
- A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
- A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
- Pandas DataFrame consists of three principal components, the data, rows, and columns.

##### Creating a DataFrame:

In [6]:
# Creating a DataFrame from a dictionary
emp= {"name":["Rohish","Smit","Priya"],"gender":["Male","Male","Female"], "email":["rohish@gmail.com","smit@gmail.com","priya@gmail.com"] }
emp

{'name': ['Rohish', 'Smit', 'Priya'],
 'gender': ['Male', 'Male', 'Female'],
 'email': ['rohish@gmail.com', 'smit@gmail.com', 'priya@gmail.com']}

In [7]:
df_emp = pd.DataFrame(emp)

In [16]:
df_emp

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com
1,Smit,Male,smit@gmail.com
2,Priya,Female,priya@gmail.com


In [15]:
print(df_emp)

     name  gender             email
0  Rohish    Male  rohish@gmail.com
1    Smit    Male    smit@gmail.com
2   Priya  Female   priya@gmail.com


In [17]:
type(df_emp)

pandas.core.frame.DataFrame

In [24]:
# df_emp['name']
type(df_emp['name'])

pandas.core.series.Series

# 

#### Viewing Data

In [28]:
# Display the first few rows
df_emp.head(1)

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com


In [29]:
# Display the last few rows
df_emp.tail(2)

Unnamed: 0,name,gender,email
1,Smit,Male,smit@gmail.com
2,Priya,Female,priya@gmail.com


In [30]:
# Get the summary of the DataFrame
df_emp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   gender  3 non-null      object
 2   email   3 non-null      object
dtypes: object(3)
memory usage: 204.0+ bytes


In [33]:
# Get descriptive statistics
df_emp.describe()

Unnamed: 0,name,gender,email
count,3,3,3
unique,3,2,3
top,Rohish,Male,rohish@gmail.com
freq,1,2,1


#

#### Indexing and Selecting Data
- Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. 
- Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.
- Indexing can also be known as Subset Selection.

In [34]:
# Selecting a column
df_emp['name']

0    Rohish
1      Smit
2     Priya
Name: name, dtype: object

In [37]:
# Selecting multiple columns
df_emp[['name', 'email']]

Unnamed: 0,name,email
0,Rohish,rohish@gmail.com
1,Smit,smit@gmail.com
2,Priya,priya@gmail.com


##### Indexing a DataFrame using .loc[ ]
- Label-Based Indexing
- loc is used for accessing a group of rows and columns by labels or a boolean array.
- It allows you to select data by specifying row and column labels.
- It is inclusive of both the start and end labels when slicing.

In [40]:
# # Selecting a single row by label
df_emp.loc[0] 

name                Rohish
gender                Male
email     rohish@gmail.com
Name: 0, dtype: object

In [42]:
# Selecting multiple rows and columns
df_emp.loc[[0,1],['name', 'gender']]

Unnamed: 0,name,gender
0,Rohish,Male
1,Smit,Male


In [49]:
# Slicing rows and selecting specific columns
df_emp.loc[7:4, 'name':'email'] # no result as index out of bounds
df_emp.loc[0:, 'name':]
df_emp.loc[0:1, 'name':'email']

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com
1,Smit,Male,smit@gmail.com


In [53]:
# Using a boolean condition to select rows
df_emp.loc[df_emp['gender'] == 'Male']

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com
1,Smit,Male,smit@gmail.com


#

##### Indexing a DataFrame using .iloc[]
- Integer-based indexing: Selects data based on the integer positions of rows and columns, starting from 0.
- Exclusive: Excludes the end point in the selection.
- Consistent: Works consistently regardless of the index type

In [54]:
# Selecting a single row by position
df_emp.iloc[0]

name                Rohish
gender                Male
email     rohish@gmail.com
Name: 0, dtype: object

In [61]:
# Selecting multiple rows and columns by position
df_emp.iloc[[0, 2], [0, 2]]

Unnamed: 0,name,email
0,Rohish,rohish@gmail.com
2,Priya,priya@gmail.com


In [68]:
# Slicing rows and selecting specific columns by position
df_emp.iloc[0:,0:2]

Unnamed: 0,name,gender
0,Rohish,Male
1,Smit,Male
2,Priya,Female


In [73]:
# df_emp.iloc[1,2]
# df_emp.iloc[1:3,0:2]
df_emp.iloc[1:3,0:2]

Unnamed: 0,name,gender
1,Smit,Male
2,Priya,Female


In [75]:
# Selecting rows and columns with a list of positions
df_emp.iloc[[0, 1], [1, 2]]


Unnamed: 0,gender,email
0,Male,rohish@gmail.com
1,Male,smit@gmail.com


#

#### Setting a Custom Index
You can set a specific column as the index using set_index().

In [76]:
# before setting up custmom index
df_emp

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com
1,Smit,Male,smit@gmail.com
2,Priya,Female,priya@gmail.com


In [90]:
# setting up custom index
df_emp.set_index('email') # this is temporary and will not change the original dataframe

Unnamed: 0_level_0,index,name,gender
email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
rohish@gmail.com,0,Rohish,Male
smit@gmail.com,1,Smit,Male
priya@gmail.com,2,Priya,Female


In [94]:
df_emp.set_index('email',inplace=True) # changes the original dataframe

In [95]:
df_emp

Unnamed: 0_level_0,index,name,gender
email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
rohish@gmail.com,0,Rohish,Male
smit@gmail.com,1,Smit,Male
priya@gmail.com,2,Priya,Female


In [96]:
# resetting the index
df_emp.reset_index(inplace=True)

In [97]:
df_emp

Unnamed: 0,email,index,name,gender
0,rohish@gmail.com,0,Rohish,Male
1,smit@gmail.com,1,Smit,Male
2,priya@gmail.com,2,Priya,Female


In [8]:
fc = (df_emp['gender']=='Male')
fc

0     True
1     True
2    False
Name: gender, dtype: bool

In [9]:
df_emp[fc]

Unnamed: 0,name,gender,email
0,Rohish,Male,rohish@gmail.com
1,Smit,Male,smit@gmail.com


In [10]:
fci = ~(df_emp['gender']=='Male')
fci

0    False
1    False
2     True
Name: gender, dtype: bool

In [11]:
df_emp[fci]

Unnamed: 0,name,gender,email
2,Priya,Female,priya@gmail.com
