### Pandas: DataFrame and Series

Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

- Series is a one-dimensional array-like object that can hold any data type
- (integers, strings, floating point numbers, Python objects, etc.)
- It is similar to a column in a spreadsheet or a SQL table
- It is a labeled array capable of holding any data type
- The labels are called the index

In [None]:
# Creating a Series
data=[1,2,3,4,5]
series = pd.Series(data)
print("Series: \n", series)

Series: 
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [3]:
# Create series feom a dictionary
data_dict = {'a': 1, 'b': 2, 'c': 3}
series_dict = pd.Series(data_dict)
print("Series from dictionary: \n", series_dict)

Series from dictionary: 
 a    1
b    2
c    3
dtype: int64


In [4]:
# Create series from a list with custom index
data_list = [10, 20, 30]
index_list = ['x', 'y', 'z']
series_list = pd.Series(data_list, index=index_list)
print("Series from list with custom index: \n", series_list)

Series from list with custom index: 
 x    10
y    20
z    30
dtype: int64


#### DataFrame

For 2D array like objects holding values

In [5]:
# DataFrame creation
data_frame = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data_frame)
print("DataFrame: \n", df)  
print(type(df))  # Check the type of the DataFrame

DataFrame: 
       Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
<class 'pandas.core.frame.DataFrame'>


In [6]:
## create a data frame from list of dictionaries
data_list_of_dicts = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]
df = pd.DataFrame(data_frame)
print("DataFrame: \n", df)  
print(type(df)) 

DataFrame: 
       Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
<class 'pandas.core.frame.DataFrame'>


In [22]:
## reading a csv file
df = pd.read_csv('currency.csv')
df.head(5)  # Display the first 5 rows

Unnamed: 0,Code,Symbol,Name
0,AED,د.إ,United Arab Emirates d
1,AFN,؋,Afghan afghani
2,ALL,L,Albanian lek
3,AMD,AMD,Armenian dram
4,ANG,ƒ,Netherlands Antillean gu


In [16]:
## Accessing DataFrame elements
df['Name']

0        United Arab Emirates d
1                Afghan afghani
2                  Albanian lek
3                 Armenian dram
4      Netherlands Antillean gu
                 ...           
158      West African CFA franc
159                   CFP franc
160                 Yemeni rial
161          South African rand
162              Zambian kwacha
Name: Name, Length: 163, dtype: object

In [17]:
type(df['Name'])

pandas.core.series.Series

In [None]:
df.loc[0] # Access the first row

Code                         AED
Symbol                       د.إ
Name      United Arab Emirates d
Name: 0, dtype: object

In [19]:
df.iloc[1] # Access the first coloum using iloc

Code                 AFN
Symbol                 ؋
Name      Afghan afghani
Name: 1, dtype: object

In [23]:
df.iloc[0][2] # Access the first coloum using iloc

  df.iloc[0][2] # Access the first coloum using iloc


'United Arab Emirates d'

In [24]:
df

Unnamed: 0,Code,Symbol,Name
0,AED,د.إ,United Arab Emirates d
1,AFN,؋,Afghan afghani
2,ALL,L,Albanian lek
3,AMD,AMD,Armenian dram
4,ANG,ƒ,Netherlands Antillean gu
...,...,...,...
158,XOF,CFA,West African CFA franc
159,XPF,Fr,CFP franc
160,YER,﷼,Yemeni rial
161,ZAR,R,South African rand


In [27]:
## Accessing specific rows and columns
df.at[0, 'Name']

'United Arab Emirates d'

In [28]:
df.iat[2,2] # Access the value at row 2, column 2 using iat

'Albanian lek'

In [31]:
## Adding a new column
df['Country'] = 'USA'

In [32]:
df

Unnamed: 0,Code,Symbol,Name,Country
0,AED,د.إ,United Arab Emirates d,USA
1,AFN,؋,Afghan afghani,USA
2,ALL,L,Albanian lek,USA
3,AMD,AMD,Armenian dram,USA
4,ANG,ƒ,Netherlands Antillean gu,USA
...,...,...,...,...
158,XOF,CFA,West African CFA franc,USA
159,XPF,Fr,CFP franc,USA
160,YER,﷼,Yemeni rial,USA
161,ZAR,R,South African rand,USA


In [33]:
# Removing a column
df.drop('Country', axis=1, inplace=True)

In [34]:
df

Unnamed: 0,Code,Symbol,Name
0,AED,د.إ,United Arab Emirates d
1,AFN,؋,Afghan afghani
2,ALL,L,Albanian lek
3,AMD,AMD,Armenian dram
4,ANG,ƒ,Netherlands Antillean gu
...,...,...,...
158,XOF,CFA,West African CFA franc
159,XPF,Fr,CFP franc
160,YER,﷼,Yemeni rial
161,ZAR,R,South African rand


In [35]:
df.describe() # Get summary statistics of the DataFrame

Unnamed: 0,Code,Symbol,Name
count,163,163,163
unique,163,107,163
top,AED,$,United Arab Emirates d
freq,1,28,1
