# Why Use Pandas?

- Easy handling of missing data
male
Female
- Powerful group-by functionality
- Fast and efficient merging, reshaping, slicing, and dicing of data
- Read/write support for many file formats: CSV, Excel, JSON, SQL, etc.

## Understanding Pands series, DataFrame Data Structures
### Definition & Functionality
- Series: A one-dimensional labeled array, similar to a column in a spreadsheet.
- DataFrame: A two-dimensional table-like strucutred with labeled row and columns, similar to an excel sheet.

In [6]:
import pandas as pd
import numpy as np

### Creating series manually

In [8]:
s = pd.Series([10,20,30,40], index=['a','b','c','d'])
print("series:\n",s)

series:
 a    10
b    20
c    30
d    40
dtype: int64


In [13]:
prices = [649000, 391000, 5476000, 1786000, 1091000]
carnames = ['swift', 'santro', 'audi', 'elantra', 'bolero']

In [10]:
car_series = pd.Series(data=prices, index = carnames)
car_series

swift       649000
(santro     391000
audi       5476000
elantra    1786000
bolero     1091000
dtype: int64

In [11]:
type(car_series)

pandas.core.series.Series

In [14]:
car_series = pd.Series(data=prices, index = carnames,name = 'price')
car_series

swift       649000
santro      391000
audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: int64

In [16]:
## Example

entries = {'swift': 649000,
'santro': '391000',
'audi': 5476000,
'elantra': 1786000,
'bolero': 1091000}
car_series = pd.Series(data = entries, name = 'price')
car_series

swift       649000
santro      391000
audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: object

In [17]:
type(car_series['swift'])

int

In [18]:
type(car_series['santro'])

str

In [19]:
len(car_series)

5

In [20]:
car_series.shape

(5,)

### Example
#### Accessing data from series using logical conditions

In [22]:
entries = {'swift': 649000,
'santro' : 391000,
'audi': 5476000,
'elantra': 1786000,
'bolero': 1091000}

car_series = pd.Series(data = entries, name = 'price')
car_series

swift       649000
santro      391000
audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: int64

In [26]:
car_series > 1000000

swift      False
santro     False
audi        True
elantra     True
bolero      True
Name: price, dtype: bool

ERROR! Session/line number was not unique in database. History logging moved to new session 12


In [25]:
car_series[car_series > 1000000]

audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: int64

In [27]:
car_series[car_series > 1000000].index[0]

'audi'

In [28]:
car_series[car_series > 1000000].index[2]

'bolero'

In [29]:
car_series[car_series > 1000000].values[0]

np.int64(5476000)

In [30]:
car_series[car_series > 1000000].values[2]

np.int64(1091000)

In [32]:
car_series[(car_series > 1000000) & (car_series < 2000000)]

elantra    1786000
bolero     1091000
Name: price, dtype: int64

## Quiz time

Consider the series shown below:
cust
_names = ['Hemang', 'Farheen', 'Himadri', 'Monisha']
cust_bill = [256.78, 434.53, 109.25, 529.42]
cust
_info = pd.Series (cust_bill, cust.
_names )
Write code to print the names of the customers who have spent more than 300 rupees.

In [35]:
cust_names = ['Hemang', 'Farheen', 'Himadri', 'Monisha']
cust_bill = [256.78, 434.53, 109.25, 529.42] 

cust_info = pd.Series (cust_bill, cust_names )

cust_info

Hemang     256.78
Farheen    434.53
Himadri    109.25
Monisha    529.42
dtype: float64

In [38]:
print(list(cust_info[cust_info>300].index))

['Farheen', 'Monisha']


In [40]:
### Example
# Accessing data from series using the 1qg[] method

entries = {'swift': 649000,
'santro': 391000,
'audi': 5476000,
'elantra': 1786000,
'bolero': 1091000}

car_series = pd.Series(data = entries, name = 'price')

car_series

swift       649000
santro      391000
audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: int64

In [41]:
car_series.loc['swift']

np.int64(649000)

In [42]:
car_series.loc[['swift']]

swift    649000
Name: price, dtype: int64

In [43]:
car_series.loc[['swift','audi']]

swift     649000
audi     5476000
Name: price, dtype: int64

In [46]:
car_series.loc['swift':'elantra']

swift       649000
santro      391000
audi       5476000
elantra    1786000
Name: price, dtype: int64

### NOTE that this similar to NumPy array slicing, but the .loc[] method is inclusive of the stop value as well

In [47]:
car_series.loc[:'elantra']

swift       649000
santro      391000
audi       5476000
elantra    1786000
Name: price, dtype: int64

### Example
#### Accessing data from series using the iloc[] method

In [48]:
car_series.iloc[0]

np.int64(649000)

In [49]:
car_series.iloc[[0,2,4]]

swift      649000
audi      5476000
bolero    1091000
Name: price, dtype: int64

In [50]:
car_series.iloc[0:2]

swift     649000
santro    391000
Name: price, dtype: int64

NOTE: that the .iloc[] method is not inclusive if the stop element like the .loc[] method.The .iloc[] method is very similar to NumPy array indexing and slicing.

In [52]:
car_series.iloc[-1]

np.int64(1091000)

In [53]:
car_series.iloc[1:5:2]

santro      391000
elantra    1786000
Name: price, dtype: int64

In [54]:
car_series

swift       649000
santro      391000
audi       5476000
elantra    1786000
bolero     1091000
Name: price, dtype: int64

### Quiz
Consider the series shown below:
```
cust_ names = ['Mahesh', 'Farheen', 'Himadri', 'Monisha']
cust_bj11 = [256.78, 434.53, 109.25, 529.42]
cust_ info = pa.Series (cust_bill, cust_names)
```

Use the different methods you have studied to extract the bill amounts for Mahesh and Monisha.

In [58]:
cust_names = ['Mahesh', 'Farheen', 'Himadri', 'Monisha']
cust_bill = [256.78, 434.53, 109.25, 529.42]
cust_info = pd.Series (cust_bill, cust_names)
cust_info

Mahesh     256.78
Farheen    434.53
Himadri    109.25
Monisha    529.42
dtype: float64

In [62]:
print(cust_info.loc[['Mahesh','Monisha']])

Mahesh     256.78
Monisha    529.42
dtype: float64


In [69]:
print(cust_info.iloc[[0,3]])

Mahesh     256.78
Monisha    529.42
dtype: float64


## Handling DataFrames
## Definition & Functionality
- Loading Data: Import CSV, Excel, or SQL files into Pandas DataFrames.
- Inspecting Data: View structure, columns, and basic statistics.
- Modifying Data: Add, rename, or remove columns.

## Creating DataFrame manually

In [72]:
df = pd. DataFrame({
'Product': ['A', 'B', 'C'],
'Price': [100, 150, 200],
'Quantity': [5, 3, 4]
})
df

Unnamed: 0,Product,Price,Quantity
0,A,100,5
1,B,150,3
2,C,200,4


In [73]:
df.to_csv("retail_sales.csv",index=False)
print("successfully created")

successfully created


Note: file is created in current location where jupyre is runing

### Reading CSV sing pandas

In [75]:
sales_data = pd.read_csv("retail_sales.csv")
sales_data

Unnamed: 0,Product,Price,Quantity
0,A,100,5
1,B,150,3
2,C,200,4


In [76]:
#print only one columns
sales_data['Price']

0    100
1    150
2    200
Name: Price, dtype: int64

In [78]:
sales_data['Price'].values[0]

np.int64(100)

In [84]:
#How to read csv file from a location
filepath = "/Users/amupraba/Desktop/ScriptlessAutomtionTool/Performance_Benchmarks/test_data/api/test_case_flows/feature1/test1_sanity.csv"
df = pd.read_csv(filepath)

In [85]:
df

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
0,END_POINT,https://jsonplaceholder.typicode.com/todos/1
1,METHOD,GET
2,PARAMS:KEY,NONE
3,PARAMS:VALUE,NONE
4,AUTH:KEY,NONE
5,AUTH:VALUE,NONE
6,HEADERS:KEY,NONE
7,HEADERS:VALUE,NONE
8,BODY:KEY,NONE
9,BODY:VALUE,NONE


In [87]:
df.head()

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
0,END_POINT,https://jsonplaceholder.typicode.com/todos/1
1,METHOD,GET
2,PARAMS:KEY,NONE
3,PARAMS:VALUE,NONE
4,AUTH:KEY,NONE


In [89]:
#filepath = r"/Users/amupraba/Desktop/ScriptlessAutomtionTool/Performance_Benchmarks/test_data/api/test_case_flows/feature1/test1_sanity.csv"
#df = pd.read_csv(filepath
# r means raw string

In [90]:
df.head(7)

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
0,END_POINT,https://jsonplaceholder.typicode.com/todos/1
1,METHOD,GET
2,PARAMS:KEY,NONE
3,PARAMS:VALUE,NONE
4,AUTH:KEY,NONE
5,AUTH:VALUE,NONE
6,HEADERS:KEY,NONE


In [91]:
df.tail()

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
10,RESPONSE:CODE,200
11,RESPONSE:SCHEMA,NONE
12,RESPONSE:JSON_PATH,NONE
13,RESPONSE:EXPECTED_VALUE,NONE
14,RESPONSE:STORE_VALUE,NONE


In [93]:
df.tail(7)

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
8,BODY:KEY,NONE
9,BODY:VALUE,NONE
10,RESPONSE:CODE,200
11,RESPONSE:SCHEMA,NONE
12,RESPONSE:JSON_PATH,NONE
13,RESPONSE:EXPECTED_VALUE,NONE
14,RESPONSE:STORE_VALUE,NONE


In [94]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   DEPENDANT_TEST_CASE  15 non-null     object
 1   NONE                 15 non-null     object
dtypes: object(2)
memory usage: 372.0+ bytes


In [95]:
df.describe()

Unnamed: 0,DEPENDANT_TEST_CASE,NONE
count,15,15
unique,15,4
top,END_POINT,NONE
freq,1,12
