# Pandas: Powerful Python Data Analysis Toolkit

![imagepandas.png](attachment:imagepandas.png)

### What is Pandas

pandas is a data manipulation package in Python for tabular data. That is, data in the form 
of rows and columns, also known as DataFrames. Intuitively, you can think of a DataFrame as an 
Excel sheet.

pandasâ€™ functionality includes data transformations, like sorting rows and taking subsets, to calculating summary statistics such as the mean, reshaping DataFrames, and joining DataFrames together. pandas works well with other popular Python data science packages, often called the PyData ecosystem, including

1. NumPy for numerical computing
2. Matplotlib, Seaborn, Plotly, and other data visualization packages
3. scikit-learn for machine learning

### What is pandas used for?

pandas is used throughout the data analysis workflow. With pandas, you can:

1. Import datasets from databases, spreadsheets, comma-separated values (CSV) files, and more.
2. Clean datasets, for example, by dealing with missing values.
3. Tidy datasets by reshaping their structure into a suitable format for analysis.
4. Aggregate data by calculating summary statistics such as the mean of columns, correlation between them, and more.
5. Visualize datasets and uncover insights.

### Installing pandas

In [111]:
#https://pypi.org/project/pandas/

!pip install pandas

# Installing pandas is straightforward; just use the pip install command in your terminal.
# If you want to install in anaconda prompt - pip install pandas



### Importing data in pandas

To load the pandas package and start working with it, import the package. The community agreed
alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the
pandas documentation

In [112]:
import pandas as pd
#I want to start using pandas

In [113]:
data1={ "Name": ["Braund, Mr. Owen Harris","Allen, Mr. William Henry","Bonnell, Miss. Elizabeth"],
 "Age": [22, 35, 58],
 "Gender": ["male", "male", "female"]
}


In [114]:
data1

{'Name': ['Braund, Mr. Owen Harris',
  'Allen, Mr. William Henry',
  'Bonnell, Miss. Elizabeth'],
 'Age': [22, 35, 58],
 'Gender': ['male', 'male', 'female']}

In [115]:
type(data1)

dict

#### pandas data table representation

In [11]:
df=pd.DataFrame(data=data1)

In [12]:
df

Unnamed: 0,Name,Age,Gender
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


In [13]:
'''
When using a Python dictionary of lists, the dictionary keys
will be used as column headers and the values in each list as columns of the DataFrame


'''

'\nWhen using a Python dictionary of lists, the dictionary keys\nwill be used as column headers and the values in each list as columns of the DataFrame\n\n\n'

### what is Dataframe

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers,
floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.


In [14]:
# In spreadsheet software, the table representation of our data would look very similar

![image1.png](attachment:image1.png)

### What is series in Pandas

Each column in a DataFrame is a Series

In [15]:
df

Unnamed: 0,Name,Age,Gender
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


In [20]:
type(df)

pandas.core.frame.DataFrame

In [17]:
test_series=df['Name']

In [18]:
test_series

0     Braund, Mr. Owen Harris
1    Allen, Mr. William Henry
2    Bonnell, Miss. Elizabeth
Name: Name, dtype: object

In [19]:
type(test_series)

pandas.core.series.Series

![image3.png](attachment:image3.png)

![image4.png](attachment:image4.png)

In [22]:
#You can create a Series from scratch as well:
ages = pd.Series([22, 35, 58], name="Age")


In [23]:
ages


0    22
1    35
2    58
Name: Age, dtype: int64

In [24]:
type(ages)

pandas.core.series.Series

### Do something with a DataFrame or Series

I want to know the maximum Age of the passengers
We can do this on the DataFrame by selecting the Age column and applying max():

In [25]:
df

Unnamed: 0,Name,Age,Gender
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


In [28]:
df['Age'].max()
#dataframe

58

In [118]:
#ages
#series

In [31]:
ages.max()

58

In [32]:
# The describe() method provides a quick overview of the numerical data in a DataFrame

In [33]:
df

Unnamed: 0,Name,Age,Gender
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


In [36]:
df.describe()

# As the Name and Gender
# columns are textual data, these are by default not taken into account by the describe() method

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0


In [None]:
#lets practice how to create dataframe



In [1]:
country = ["Brazil", "Russia", "India", "China", "South Africa"]
capital = ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"]
area = [8.516, 17.10, 3.286, 9.67, 2.98]
population = [200.4, 143.5, 1252, 133, 4.3] 

In [4]:
brics=pd.DataFrame(data={'country': country, 
                  'capital': capital,
                  'area': area,
                  'population': population})

In [5]:
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [6]:
type(brics)

pandas.core.frame.DataFrame

#### shape

In [7]:
brics.shape
#Return a tuple representing the dimensionality of the DataFrame.

(5, 4)

#### index

In [8]:
brics.index

RangeIndex(start=0, stop=5, step=1)

In [9]:
#lets change the index

brics.index = ["A", "B", "C", "D", "E"]

In [48]:
brics

Unnamed: 0,country,capital,area,population
A,Brazil,Brasilia,8.516,200.4
B,Russia,Moscow,17.1,143.5
C,India,New Delhi,3.286,1252.0
D,China,Beijing,9.67,133.0
E,South Africa,Pretoria,2.98,4.3


In [10]:
brics.index = range(1,10,2)

In [11]:
brics

Unnamed: 0,country,capital,area,population
1,Brazil,Brasilia,8.516,200.4
3,Russia,Moscow,17.1,143.5
5,India,New Delhi,3.286,1252.0
7,China,Beijing,9.67,133.0
9,South Africa,Pretoria,2.98,4.3


In [12]:
brics.columns #retrieves the column names

Index(['country', 'capital', 'area', 'population'], dtype='object')

In [13]:
c = brics.country #column series

In [14]:
type(c)

pandas.core.series.Series

In [15]:
brics

Unnamed: 0,country,capital,area,population
1,Brazil,Brasilia,8.516,200.4
3,Russia,Moscow,17.1,143.5
5,India,New Delhi,3.286,1252.0
7,China,Beijing,9.67,133.0
9,South Africa,Pretoria,2.98,4.3


In [16]:
brics.country

1          Brazil
3          Russia
5           India
7           China
9    South Africa
Name: country, dtype: object

In [17]:
brics['country']

1          Brazil
3          Russia
5           India
7           China
9    South Africa
Name: country, dtype: object

In [18]:
brics[['country']]

Unnamed: 0,country
1,Brazil
3,Russia
5,India
7,China
9,South Africa


In [19]:
population_datatype=brics.population

In [21]:
population_datatype.dtype

dtype('float64')

In [22]:
brics.index

RangeIndex(start=1, stop=10, step=2)

In [23]:
brics

Unnamed: 0,country,capital,area,population
1,Brazil,Brasilia,8.516,200.4
3,Russia,Moscow,17.1,143.5
5,India,New Delhi,3.286,1252.0
7,China,Beijing,9.67,133.0
9,South Africa,Pretoria,2.98,4.3


In [24]:
brics.set_index("country")

Unnamed: 0_level_0,capital,area,population
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [26]:
brics
#there is no change in orginal dataset

Unnamed: 0,country,capital,area,population
1,Brazil,Brasilia,8.516,200.4
3,Russia,Moscow,17.1,143.5
5,India,New Delhi,3.286,1252.0
7,China,Beijing,9.67,133.0
9,South Africa,Pretoria,2.98,4.3


In [27]:
brics = brics.set_index("country")

In [28]:
brics

Unnamed: 0_level_0,capital,area,population
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [29]:
brics.reset_index() #inplace = T will make the changes inplace in the brics variable

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [30]:
brics

Unnamed: 0_level_0,capital,area,population
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [31]:
brics.reset_index(inplace = True) #inplace = T will make the changes inplace in the brics variable

In [32]:
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [None]:
brics = brics.set_index("country")

In [36]:
brics

Unnamed: 0_level_0,capital,area,population
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [37]:
brics

Unnamed: 0_level_0,capital,area,population
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [38]:
brics.set_index("capital", inplace = True)

In [39]:
brics

Unnamed: 0_level_0,area,population
capital,Unnamed: 1_level_1,Unnamed: 2_level_1
Brasilia,8.516,200.4
Moscow,17.1,143.5
New Delhi,3.286,1252.0
Beijing,9.67,133.0
Pretoria,2.98,4.3


In [40]:
brics

Unnamed: 0_level_0,area,population
capital,Unnamed: 1_level_1,Unnamed: 2_level_1
Brasilia,8.516,200.4
Moscow,17.1,143.5
New Delhi,3.286,1252.0
Beijing,9.67,133.0
Pretoria,2.98,4.3


In [45]:
# brics = brics.set_index("area", inplace = True)

In [48]:
brics #
#inplace = T will make the changes inplace and returns a null value, so please 
#make sure that you are not doing the assignment to var brics along with inplace argument

In [49]:
country = ["Brazil", "Russia", "India", "China", "South Africa"]
capital = ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"]
area = [8.516, 17.10, 3.286, 9.67, 2.98]
population = [200.4, 143.5, 1252, 133, 4.3] 

In [50]:
brics=pd.DataFrame(data={'country': country, 
                  'capital': capital,
                  'area': area,
                  'population': population})

In [51]:
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [52]:
brics.index

RangeIndex(start=0, stop=5, step=1)

In [54]:
# brics.index[3] = "China"

In [55]:
brics.columns

Index(['country', 'capital', 'area', 'population'], dtype='object')

In [56]:
brics.columns = range(len(brics.columns))

In [57]:
brics

Unnamed: 0,0,1,2,3
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [58]:
# brics.columns[1] = "Capital"

 "Index does not support mutable operations," arises because you're trying to modify the name of a single column directly using an index assignment, which is not allowed for pandas Index objects

In [59]:
import pandas as pd

# Creating a sample DataFrame
brics = pd.DataFrame({
    'Country': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
    'Population': [209288278, 144421022, 1393409038, 1397715000, 57779622],
    'Area': [8515767, 17098242, 3287263, 9596960, 1221037]
})

# Display the original DataFrame
print("Original DataFrame:")
print(brics)

# Rename the 'Population' column to 'Capital'
brics = brics.rename(columns={'Population': 'Capital'})

# Display the DataFrame after renaming the column
print("\nDataFrame after renaming the 'Population' column to 'Capital':")
print(brics)


Original DataFrame:
        Country  Population      Area
0        Brazil   209288278   8515767
1        Russia   144421022  17098242
2         India  1393409038   3287263
3         China  1397715000   9596960
4  South Africa    57779622   1221037

DataFrame after renaming the 'Population' column to 'Capital':
        Country     Capital      Area
0        Brazil   209288278   8515767
1        Russia   144421022  17098242
2         India  1393409038   3287263
3         China  1397715000   9596960
4  South Africa    57779622   1221037


In [68]:
country = ["Brazil", "Russia", "India", "China", "South Africa"]
capital = ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"]
area = [8.516, 17.10, 3.286, 9.67, 2.98]
population = [200.4, 143.5, 1252, 133, 4.3] 

In [70]:
brics=pd.DataFrame(data={'country': country, 
                  'capital': capital,
                  'area': area,
                  'population': population})

In [71]:
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [72]:
brics.index

RangeIndex(start=0, stop=5, step=1)

In [74]:
# brics.index[3] = "China"

In [75]:
brics.columns

Index(['country', 'capital', 'area', 'population'], dtype='object')

In [76]:
brics.columns = range(len(brics.columns))

In [77]:
brics

Unnamed: 0,0,1,2,3
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [79]:
# brics.columns[1] = "Capital"

In [80]:
type(brics.columns)

pandas.core.indexes.range.RangeIndex

In [81]:
b = list(brics.columns)

In [82]:
b[1] = "Capital"

In [83]:
brics.columns = b

In [86]:
brics

Unnamed: 0,0,Capital,2,3
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [87]:
country = ["Brazil", "Russia", "India", "China", "South Africa"]
capital = ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"]
area = [8.516, 17.10, 3.286, 9.67, 2.98]
population = [200.4, 143.5, 1252, 133, 4.3] 

In [88]:
brics=pd.DataFrame(data={'country': country, 
                  'capital': capital,
                  'area': area,
                  'population': population})

In [89]:
brics


Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [90]:
b=brics.rename(columns={'country': 'count', 'area': 'ar'})

In [92]:
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [93]:
b

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [94]:
a=b.rename(index={2: "India", 4: "SA"})

In [95]:
a

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
India,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
SA,South Africa,Pretoria,2.98,4.3


In [96]:
a.rename(mapper = str.upper, axis = 1) #axis = 1 is for columns and axis = 0 is for rows

Unnamed: 0,COUNT,CAPITAL,AR,POPULATION
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
India,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
SA,South Africa,Pretoria,2.98,4.3


In [97]:
b

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [100]:
b.rename(mapper = lambda x: x**2, axis = 0, inplace = True)

In [99]:
#rename - is a fn applicable only for row names and column names

In [101]:
b

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
16,India,New Delhi,3.286,1252.0
81,China,Beijing,9.67,133.0
256,South Africa,Pretoria,2.98,4.3


In [102]:
b.reset_index(inplace = True)

In [103]:
b

Unnamed: 0,index,count,capital,ar,population
0,0,Brazil,Brasilia,8.516,200.4
1,1,Russia,Moscow,17.1,143.5
2,16,India,New Delhi,3.286,1252.0
3,81,China,Beijing,9.67,133.0
4,256,South Africa,Pretoria,2.98,4.3


In [104]:
b.drop(labels = ["index"], axis = 1, inplace = True)

In [105]:
b

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3


In [106]:
b.set_index("count", inplace=True)

In [107]:
b

Unnamed: 0_level_0,capital,ar,population
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brazil,Brasilia,8.516,200.4
Russia,Moscow,17.1,143.5
India,New Delhi,3.286,1252.0
China,Beijing,9.67,133.0
South Africa,Pretoria,2.98,4.3


In [108]:
b.reset_index(inplace = True)

In [109]:
b = b.reset_index(drop = True)

In [110]:
b

Unnamed: 0,count,capital,ar,population
0,Brazil,Brasilia,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.67,133.0
4,South Africa,Pretoria,2.98,4.3
