![Pandas_logo.svg.png](attachment:Pandas_logo.svg.png)

# PANDAS 
     is a popular Python library used for data manipulation and analysis.

* It features two main data structures: 
    * "series" object, representing a single column of data or observations related to a single variable,
    * "data frame" object, which contains multiple columns of data. 

In [3]:
! pip install pandas



In [13]:
import pandas as pd
import numpy as np

# Pandas Series
A Pandas Series is a fundamental component of the Pandas library, representing a one-dimensional labeled array that can encompass diverse data types. Think of it as a single column in a spreadsheet, providing a structured way to organize and analyze data.
<br/>
![9788b6e2.webp](attachment:9788b6e2.webp)
    


# Index
In Pandas, think of an index like a row's address. It's a label or set of labels that helps us find and work with specific rows in our data, just like a primary key in a database table. This makes it easy to pick out the information we need. You can either tell Pandas what the index should be, or it can create one for you. The default is like having row numbers, starting from 0.

* Create Pandas Series from list or Numpy array

In [7]:
prices  = [10,20,46,14,25,98,63,64,79,58,26]

In [8]:
type(prices)

list

In [11]:
series_1 = pd.Series(prices)
series_1

0     10
1     20
2     46
3     14
4     25
5     98
6     63
7     64
8     79
9     58
10    26
dtype: int64

In [14]:
type(series_1)

pandas.core.series.Series

In [15]:
nd_array = np.array([10,20,35,64,8,4,96,45,54,6,8,7,1])
nd_array

array([10, 20, 35, 64,  8,  4, 96, 45, 54,  6,  8,  7,  1])

In [16]:
type(nd_array)

numpy.ndarray

In [18]:
series_2 = pd.Series(nd_array)
series_2

0     10
1     20
2     35
3     64
4      8
5      4
6     96
7     45
8     54
9      6
10     8
11     7
12     1
dtype: int32

In [19]:
type(series_2)

pandas.core.series.Series

In [23]:
# explore series's index by default it's a range start from 0
list(series_2.index)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

In [24]:
# explore series's values
series_2.values

array([10, 20, 35, 64,  8,  4, 96, 45, 54,  6,  8,  7,  1])

In [87]:
# define a series object as a view of the numpy array 
nd_array = np.array([1,2,3,5])
series_copy = pd.Series(nd_array , copy = False)

In [88]:
series_copy[0]

1

In [84]:
series_copy[0] = 99 

In [85]:
series_copy

0    99
1     2
2     3
3     5
dtype: int32

In [86]:
nd_array

array([1, 2, 3, 5])

In [112]:
#❖ Key/Value  as Series:
workdays={"day1":"sun","day2":"mon","day3":"tue"}
s = pd.Series(workdays)
print("Series: \n")
print(s)

Series: 

day1    sun
day2    mon
day3    tue
dtype: object


In [113]:
s = pd.Series(workdays,index=["day1","day3"])
print("Series: \n")
print(s)

Series: 

day1    sun
day3    tue
dtype: object


In [26]:
# define specific index for the pandas series
l = [44,5,66,2,2,33,65,4,5]
index = ["mohamed","ahmed","mahmoud","karim","majad","yasmine","farah","gad","youseef"]
series_3 = pd.Series(data= l , index = index )
series_3

mohamed    44
ahmed       5
mahmoud    66
karim       2
majad       2
yasmine    33
farah      65
gad         4
youseef     5
dtype: int64

In [27]:
series_3.index

Index(['mohamed', 'ahmed', 'mahmoud', 'karim', 'majad', 'yasmine', 'farah',
       'gad', 'youseef'],
      dtype='object')

In [28]:
series_3.values

array([44,  5, 66,  2,  2, 33, 65,  4,  5], dtype=int64)

In [29]:
# explore series shape (size,)
series_3.shape

(9,)

In [30]:
# explore all information (data type , memory usage , index , non-null values count) about series
series_3.info()

<class 'pandas.core.series.Series'>
Index: 9 entries, mohamed to youseef
Series name: None
Non-Null Count  Dtype
--------------  -----
9 non-null      int64
dtypes: int64(1)
memory usage: 444.0+ bytes


In [31]:
# explore the occurance of each value in the series
series_3.value_counts()

5     2
2     2
44    1
66    1
33    1
65    1
4     1
dtype: int64

In [33]:
# explore the summery statistics of the series
series_3.describe()

count     9.000000
mean     25.111111
std      27.397283
min       2.000000
25%       4.000000
50%       5.000000
75%      44.000000
max      66.000000
dtype: float64

In [35]:
# explore unique values of the series
series_3.unique()

array([44,  5, 66,  2, 33, 65,  4], dtype=int64)

In [37]:
# statistics measures of the series 
series_3.count()

9

In [38]:
series_3.sum()

226

In [39]:
series_3.mean()

25.11111111111111

In [42]:
series_3.mode()

0    2
1    5
dtype: int64

In [43]:
series_3.std()

27.397282914754722

In [45]:
series_3.var()

750.6111111111111

* Acees series elements using index

In [92]:
nd_array = np.array([10,20,35,64,8,4,96,45,54,6,8,7,1])
series_4 = pd.Series(nd_array)
series_4

0     10
1     20
2     35
3     64
4      8
5      4
6     96
7     45
8     54
9      6
10     8
11     7
12     1
dtype: int32

In [93]:
series_4[0]

10

In [94]:
series_4[1]

20

In [95]:
series_4[0:4]

0    10
1    20
2    35
3    64
dtype: int32

In [96]:
series_4[-1]

KeyError: -1

In [106]:
series_4.iloc[-2]

7

In [52]:
l = [44,5,66,2,2,33,65,4,5]
index = ["mohamed","ahmed","mahmoud","karim","majad","yasmine","farah","gad","youseef"]
series_5 = pd.Series(data= l , index = index )
series_5

mohamed    44
ahmed       5
mahmoud    66
karim       2
majad       2
yasmine    33
farah      65
gad         4
youseef     5
dtype: int64

In [58]:
series_5[1]

5

In [59]:
series_5[-1]

5

In [60]:
series_5['mahmoud']

66

In [109]:
series_5.loc['mahmoud']

66

* Arthematic Operations 

In [63]:
series_6 = pd.Series(data=[1,2,3,4])
series_7 = pd.Series(data=[2,3,5,6])

In [64]:
series_6+series_7

0     3
1     5
2     8
3    10
dtype: int64

In [65]:
series_6*series_7

0     2
1     6
2    15
3    24
dtype: int64

In [66]:
series_6/series_7

0    0.500000
1    0.666667
2    0.600000
3    0.666667
dtype: float64

In [67]:
series_6 = pd.Series(data=[1,2,3,4,5])
series_7 = pd.Series(data=[2,3,5,6])

In [68]:
series_6+series_7

0     3.0
1     5.0
2     8.0
3    10.0
4     NaN
dtype: float64

In [72]:
series_6 = pd.Series(data=[1,2,3,4],index = ['mohamed','ahmed','karim','majad'])
series_7 = pd.Series(data=[2,3,5,6])

In [73]:
series_6+series_7

0         NaN
1         NaN
2         NaN
3         NaN
ahmed     NaN
karim     NaN
majad     NaN
mohamed   NaN
dtype: float64

In [75]:
series_6 = pd.Series(data=[1,2,3,4],index = ['mohamed','ahmed','karim','majad'])
series_7 = pd.Series(data=[2,3,5,6],index = ['mohamed','ahmed','karim','majad'])

In [76]:
series_6+series_7

mohamed     3
ahmed       5
karim       8
majad      10
dtype: int64

# DataFrame
A Data Frame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.
<br/>
![dataframe.webp](attachment:dataframe.webp)

In [117]:
# Create pandas DataFrame from dictionary of lists
data = {
    'Name': ['Emma', 'Oliver', 'Harry', 'Sophia'], 
    'Age': [29, 25, 33, 24],
    'Department': ['HR', 'Finance', 'Marketing', 'IT']}

df_1 = pd.DataFrame(data = data)
df_1

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


In [118]:
#Create pandas DataFrame from dictionary of numpy array.
np_array = np.array(
    [['Emma', 'Oliver', 'Harry', 'Sophia'],
     [29, 25, 33, 24],
     ['HR', 'Finance', 'Marketing', 'IT']])

data = {
    'Name': np_array[0],
    'Age': np_array[1],
    'Department': np_array[2]}

df_2 = pd.DataFrame(data = data )
df_2

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


In [123]:
#Create pandas DataFrame from numpy array.
np_array = np.array(
    [['Emma', 'Oliver', 'Harry', 'Sophia'],
     [29, 25, 33, 24],
     ['HR', 'Finance', 'Marketing', 'IT']])

# To transpose the numpy array
np_array = np_array.T

df_2 = pd.DataFrame(data = np_array , columns = ['Name','Age','Department'])
df_2

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


In [124]:
#Create pandas DataFrame from list of lists
data_rows = [
    ['Emma', 29, 'HR'],
    ['Oliver', 25, 'Finance'],
    ['Harry', 33, 'Marketing'],
    ['Sophia', 24, 'IT']]

df_3 = pd.DataFrame(data_rows, columns = ['Name', 'Age', 'Department'])
df_3

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


In [125]:
# Create pandas DataFrame from list of dictionaries
data_dict = [
    {'Name': 'Emma', 'Age': 29, 'Department': 'HR'},
    {'Name': 'Oliver', 'Age': 25, 'Department': 'Finance'},
    {'Name': 'Harry', 'Age': 33, 'Department': 'Marketing'},
    {'Name': 'Sophia', 'Age': 24, 'Department': 'IT'}]

df_4 = pd.DataFrame(data_dict)
df_4

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


In [126]:
# Create pandas Dataframe from dictionary of pandas Series

pd_series1 = pd.Series(['Emma', 'Oliver', 'Harry', 'Sophia'])
pd_series2 = pd.Series([29, 25, 33, 24])
pd_series3 = pd.Series(['HR', 'Finance', 'Marketing', 'IT'])

data_dict = {'Name': pd_series1, 'Age': pd_series2, 'Department':pd_series3}# Create the DataFrame
df_5 = pd.DataFrame(data_dict)
df_5

Unnamed: 0,Name,Age,Department
0,Emma,29,HR
1,Oliver,25,Finance
2,Harry,33,Marketing
3,Sophia,24,IT


* Reading Files as DataFrame

In [152]:
# Reading Csv Files 
# Arabic Encoding utf-8-sig

df_csv = pd.read_csv(r"C:\Users\moham\Downloads\Pandas\Titanic Dataset.csv",
                   encoding = 'utf-8' , sep = "," )

In [153]:
df_csv.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [154]:
# Reading Excel Files 
df_excel = pd.read_excel(r'C:\Users\moham\Downloads\Pandas\Pandas Excel.xlsx',sheet_name = 0)

In [156]:
df_excel.head()

Unnamed: 0,Country,Region,Requester,Date of Purchase,Total,Quantity
0,India,North,John,2016-09-16,100000,567
1,US,North,Bill,2018-10-19,120000,3000
2,UK,North,Thomas,2014-06-10,140000,345
3,Australia,East,John,2010-11-23,160000,1000
4,Africa,East,Bill,2010-02-17,180000,123


In [170]:
# Reading Json Files 
data_json = pd.read_json(r'C:\Users\moham\Downloads\Pandas\E-test.json')

In [171]:
data_json

Unnamed: 0,employee_id,name,position,salary
0,1,John Doe,Software Engineer,80000
1,2,Jane Smith,Data Scientist,90000
2,3,Bob Johnson,UX Designer,75000
3,4,Alice Williams,Project Manager,100000


* DataFrame elements access

In [193]:
data = pd.read_csv(r'C:\Users\moham\Downloads\Pandas\Titanic Dataset.csv',index_col = 0 )

In [194]:
data.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [195]:
# Selecting Multiple rows
data[:4]

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S


In [196]:
data[0]

KeyError: 0

In [198]:
# Selecting Multiple columns
data[['Name','Survived']]

Unnamed: 0_level_0,Name,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",0
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1
3,"Heikkinen, Miss. Laina",1
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1
5,"Allen, Mr. William Henry",0
...,...,...
887,"Montvila, Rev. Juozas",0
888,"Graham, Miss. Margaret Edith",1
889,"Johnston, Miss. Catherine Helen ""Carrie""",0
890,"Behr, Mr. Karl Howell",1


In [201]:
# loc   : select rows based on index and select columns based on columns's names
data.loc[1]

Survived                          0
Pclass                            3
Name        Braund, Mr. Owen Harris
Sex                            male
Age                            22.0
SibSp                             1
Parch                             0
Ticket                    A/5 21171
Fare                           7.25
Cabin                           NaN
Embarked                          S
Name: 1, dtype: object

In [202]:
data.loc[1,'Name']

'Braund, Mr. Owen Harris'

In [203]:
# Filtering the rows by condation inside loc function 
data.loc[data['Survived'] == 1,'Name']

PassengerId
2      Cumings, Mrs. John Bradley (Florence Briggs Th...
3                                 Heikkinen, Miss. Laina
4           Futrelle, Mrs. Jacques Heath (Lily May Peel)
9      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
10                   Nasser, Mrs. Nicholas (Adele Achem)
                             ...                        
876                     Najib, Miss. Adele Kiamie "Jane"
880        Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
881         Shelley, Mrs. William (Imanita Parrish Hall)
888                         Graham, Miss. Margaret Edith
890                                Behr, Mr. Karl Howell
Name: Name, Length: 342, dtype: object

In [205]:
# iloc   : select rows based on position and select columns based on columns's index
data.iloc[0]

Survived                          0
Pclass                            3
Name        Braund, Mr. Owen Harris
Sex                            male
Age                            22.0
SibSp                             1
Parch                             0
Ticket                    A/5 21171
Fare                           7.25
Cabin                           NaN
Embarked                          S
Name: 1, dtype: object

In [206]:
data.iloc[0,2]

'Braund, Mr. Owen Harris'

In [207]:
# Filtering the rows by condation not working with iloc
data.iloc[data['Survived'] == 1,2]

NotImplementedError: iLocation based boolean indexing on an integer type is not available