# Introduction On Pandas

- Pandas is an open-source Python library that provides data structures and data analysis tools.
- It’s built on top of NumPy, so it’s fast for numerical operations.
- The name comes from “Panel Data” (economics term for multi-dimensional data).

## Why Pandas?

Uses of Pandas in Python:

1. Store data in an easy-to-use table format (DataFrame).
2. Read data from files like CSV, Excel, JSON, SQL, etc.
3. Clean data by removing or filling missing values.
4. Filter and select specific rows or columns.
5. Sort data by any column.
6. Merge or join multiple datasets.
7. Group data and calculate sums, averages, counts, etc.
8. Handle date and time data easily.
9. Save processed data back to CSV, Excel, etc.
10. Quickly analyze and explore large datasets.


Why Pandas is Necessary:

1. It helps us work with data easily like tables or spreadsheets.
2. Without Pandas, handling big data manually is slow and confusing.
3. It gives tools to clean messy or incomplete data quickly.
4. We can analyze data and find patterns faster.
5. It supports many file types, so no need to write complex code for each.
6. It makes data sorting, filtering, and grouping simple.
7. Saves time and effort when working with large datasets.
8. It integrates well with other Python libraries for data science.
9. Enables quick visualization and reporting of data.
10. Overall, it makes data science work smooth and efficient.


Use of Pandas in Data Science :

1. Data Loading: Read data from CSV, Excel, databases easily.
2. Data Cleaning: Handle missing values, remove duplicates, fix errors.
3. Data Exploration: Summarize data with statistics and visual checks.
4. Data Transformation: Filter, sort, group, and modify data.
5. Feature Engineering: Create new columns or features from existing data.
6. Data Integration: Merge or join multiple datasets.
7. Time Series Analysis: Work with dates and times effectively.
8. Preparing Data for Machine Learning: Format and structure data for ML models.
9. Quick Prototyping: Test ideas fast with easy data manipulation.
10. Exporting Results: Save cleaned and processed data for reports or further analysis.


### Read CSV File

In [11]:
pip install pandas

Collecting pandas
  Downloading pandas-2.3.1-cp311-cp311-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.23.2 (from pandas)
  Downloading numpy-2.3.2-cp311-cp311-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.9 kB ? eta -:--:--
     ------------------- ------------------ 30.7/60.9 kB 640.0 kB/s eta 0:00:01
     ------------------------------- ------ 51.2/60.9 kB 525.1 kB/s eta 0:00:01
     ------------------------------- ------ 51.2/60.9 kB 525.1 kB/s eta 0:00:01
     -------------------------------------- 60.9/60.9 kB 359.5 kB/s eta 0:00:00
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.3.1-cp311-cp311-win_amd64.whl (11.3 MB)
   ---------------------------------------- 0.0/11.3 MB ? eta -:--:--
   ---------------------------------------- 0.1/11.3 MB ? eta -:--:--



[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: C:\Users\DAV BABA\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [12]:
import pandas as pd

In [13]:
df = pd.read_csv("services.csv")
df

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
0,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
1,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,
2,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
3,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
4,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
5,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
6,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
7,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
8,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,
9,10,10,,,,Walk in. Proof of California residency to rece...,,"Provides general reading and media materials, ...",Resident of California to obtain a card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Main Library,,San Mateo County,active,No wait.,,


- By default Pandas understand 1st row of CSV file as column name.
- If we dont want pandas to consider 1st row of CSV as column name we pass `pd.read.csv("name.csv", header = none)`
- Internal index are provided by pandas for both rows and column

In [14]:
pd.read_csv("services.csv", header = None) 
# In above 23 rows because 1 row considered as comumn name
# In below 24 rows, no column name

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,12,13,14,15,16,17,18,19,20,21
0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
1,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
2,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,
3,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
4,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
5,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
6,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
7,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
8,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
9,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,


### Functions for read csv

#### Skipping rows

In [15]:
pd.read_csv("services.csv", skiprows= 2)

Unnamed: 0,2,2.1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligible people age 55 or over who meet certain income qualifications.,"Age 55 or over, county resident and willing and able to work. Income requirements vary according to program",Unnamed: 9,...,Unnamed: 12,"EMPLOYMENT/TRAINING SERVICES, Job Development, Job Information/Placement/Referral, Job Training, Job Training Formats, Job Search/Placement, Older Adults",Unnamed: 14,Second Career Employment Program,Unnamed: 16,San Mateo County,active,Varies.,Unnamed: 20,Unnamed: 21
0,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
1,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
2,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
3,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
4,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
5,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
6,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,
7,10,10,,,,Walk in. Proof of California residency to rece...,,"Provides general reading and media materials, ...",Resident of California to obtain a card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Main Library,,San Mateo County,active,No wait.,,
8,11,11,,,,Walk in. Proof of California residency require...,,"Provides general reading materials, including ...",Resident of California to obtain a library car...,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Schaberg Branch,,San Mateo County,active,No wait.,,
9,12,12,,,,"Walk in or apply by phone, email or webpage re...","Adults, parents, children in 1st-12th grades i...",Offers an intergenerational literacy program f...,English-speaking adult reading at or below 7th...,,...,,"EDUCATION SERVICES, Adult, Alternative, Litera...",,Project Read,,Daly City,active,Depends on availability of tutors for small gr...,,


#### skipping column

In [16]:
pd.read_csv("services.csv", usecols= ['program_id', 'application_process'])

Unnamed: 0,program_id,application_process
0,,Walk in or apply by phone.
1,,Apply by phone for an appointment.
2,,Phone for information (403-4300 Ext. 4322).
3,,Apply by phone.
4,,Phone for information.
5,,Walk in or apply by phone for membership appli...
6,,"Apply by phone or be referred by a doctor, soc..."
7,,Apply by phone.
8,,Walk in. Proof of residency in California requ...
9,,Walk in. Proof of California residency to rece...


# How to read the data using Pandas

In [17]:
df = pd.read_csv("services.csv")
df

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
0,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
1,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,
2,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
3,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
4,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
5,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
6,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
7,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
8,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,
9,10,10,,,,Walk in. Proof of California residency to rece...,,"Provides general reading and media materials, ...",Resident of California to obtain a card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Main Library,,San Mateo County,active,No wait.,,


## Data Frame 

In pandas we store the data in the form of data frame.
- DF two dimensional in nature
- DF consists of rows and col.
- A multiple series together form a DataFrame


`Series` : One dimensional in nature
- Every column of DataFrame is Series.

In [18]:
type(df)

pandas.core.frame.DataFrame

### Accessing the column

`dataframe.column_name`

In [19]:
df.application_process

0                            Walk in or apply by phone.
1                    Apply by phone for an appointment.
2           Phone for information (403-4300 Ext. 4322).
3                                       Apply by phone.
4                                Phone for information.
5     Walk in or apply by phone for membership appli...
6     Apply by phone or be referred by a doctor, soc...
7                                       Apply by phone.
8     Walk in. Proof of residency in California requ...
9     Walk in. Proof of California residency to rece...
10    Walk in. Proof of California residency require...
11    Walk in or apply by phone, email or webpage re...
12    Walk in. Proof of California residency require...
13    Call for appointment. Referral from human serv...
14            Walk in or through other agency referral.
15    Walk in. Written application, identification r...
16                                Call for information.
17    Call for screening appointment. Medical vi

In [20]:
type(df["application_process"])

pandas.core.series.Series

In [21]:
l = [1,2,3,4]
s = pd.Series(l)

In [22]:
s

0    1
1    2
2    3
3    4
dtype: int64

In [23]:
type(s)

pandas.core.series.Series

In [24]:
# Accessing elements from series
s[1:]

1    2
2    3
3    4
dtype: int64

In [25]:
s[2:4]

2    3
3    4
dtype: int64

- By default indices are integers
- We can have custom indices.

In [26]:
d = pd.Series([100, 200, 300], index= ["Az", "By", "Cx"])

In [27]:
d

Az    100
By    200
Cx    300
dtype: int64

In [28]:
type(d)

pandas.core.series.Series

In [29]:
d["Az"]

np.int64(100)

To see all index of series

In [30]:
d.index

Index(['Az', 'By', 'Cx'], dtype='object')

Resetting to implicit index of python

In [31]:
d.reset_index(drop=True)

0    100
1    200
2    300
dtype: int64

In [32]:
d

Az    100
By    200
Cx    300
dtype: int64

### Converting a list to DataFrame

In [33]:
pd.DataFrame(d)

Unnamed: 0,0
Az,100
By,200
Cx,300


### Accessing define rows 

- `dataframe.head(no of rows)` >> by default it gives 1st five rows.
- `dataframe.tail(no. of rows)` >> by default it will give last five rows

In [34]:
df

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
0,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
1,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,
2,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
3,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
4,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
5,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
6,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
7,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
8,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,
9,10,10,,,,Walk in. Proof of California residency to rece...,,"Provides general reading and media materials, ...",Resident of California to obtain a card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Main Library,,San Mateo County,active,No wait.,,


In [35]:
df.head(2)

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
0,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
1,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,


In [36]:
df.tail()

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
18,19,19,,,,Call for screening appointment (650-347-3648).,,Provides free medical and dental care to those...,Low-income person without access to health care,,...,,"HEALTH SERVICES, Outpatient Care, Community Cl...",,San Mateo Free Medical Clinic,,"Belmont, Burlingame",active,Varies.,,
19,20,20,,,,Walk in.,,no unrequired fields for this service,,,...,,,,Service with blank fields,,,defunct,,,
20,21,21,,,,By phone during business hours.,,just a test service,,,...,,,,Service for Admin Test Location,,San Mateo County,inactive,,,
21,22,22,,"Cash, Check, Credit Card",Fotos para pasaportes,Walk in or apply by phone or mail,"Profit and nonprofit businesses, the public, m...",[NOTE THIS IS NOT A REAL SERVICE--THIS IS FOR ...,,passports@example.org,...,We offer 3-way interpretation services over th...,"Salud, Medicina",Spanish,Passport Photos,Government-issued picture identification,"Alameda County, San Mateo County",active,No wait to 2 weeks.,http://www.example.com,"105, 108, 108-05, 108-05-01, 111, 111-05"
22,23,22,,,,Walk in or apply by phone or mail,"Second service and nonprofit businesses, the p...",[NOTE THIS IS NOT A REAL ORGANIZATION--THIS IS...,,,...,,"Ruby on Rails/Postgres/Redis, testing, wic",,Example Service Name,,"San Mateo County, Alameda County",active,No wait to 2 weeks,http://www.example.com,


In [37]:
df.tail(3)

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
20,21,21,,,,By phone during business hours.,,just a test service,,,...,,,,Service for Admin Test Location,,San Mateo County,inactive,,,
21,22,22,,"Cash, Check, Credit Card",Fotos para pasaportes,Walk in or apply by phone or mail,"Profit and nonprofit businesses, the public, m...",[NOTE THIS IS NOT A REAL SERVICE--THIS IS FOR ...,,passports@example.org,...,We offer 3-way interpretation services over th...,"Salud, Medicina",Spanish,Passport Photos,Government-issued picture identification,"Alameda County, San Mateo County",active,No wait to 2 weeks.,http://www.example.com,"105, 108, 108-05, 108-05-01, 111, 111-05"
22,23,22,,,,Walk in or apply by phone or mail,"Second service and nonprofit businesses, the p...",[NOTE THIS IS NOT A REAL ORGANIZATION--THIS IS...,,,...,,"Ruby on Rails/Postgres/Redis, testing, wic",,Example Service Name,,"San Mateo County, Alameda County",active,No wait to 2 weeks,http://www.example.com,


To Know how many rows and columns are there:

`dataframe.shape`   >> Rows and col

`dataframe.columns`  >> Columns name only


In [38]:
df.shape

(23, 22)

In [39]:
df.columns

Index(['id', 'location_id', 'program_id', 'accepted_payments',
       'alternate_name', 'application_process', 'audience', 'description',
       'eligibility', 'email', 'fees', 'funding_sources',
       'interpretation_services', 'keywords', 'languages', 'name',
       'required_documents', 'service_areas', 'status', 'wait_time', 'website',
       'taxonomy_ids'],
      dtype='object')

Converting column into list:
`list(dataframe.columns)`

In [40]:
list(df.columns)

['id',
 'location_id',
 'program_id',
 'accepted_payments',
 'alternate_name',
 'application_process',
 'audience',
 'description',
 'eligibility',
 'email',
 'fees',
 'funding_sources',
 'interpretation_services',
 'keywords',
 'languages',
 'name',
 'required_documents',
 'service_areas',
 'status',
 'wait_time',
 'website',
 'taxonomy_ids']

Accessing any random rows:


`df.sample(no.)`  Random rows  >> By default it will give one random row

In [41]:
df.sample(2)

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
15,16,16,,,,"Walk in. Written application, identification r...",,Provides emergency assistance including food a...,None for emergency assistance,,...,,"COMMODITY SERVICES, Clothing/Personal Items, C...",,Sunnyvale Corps,,,active,No wait.,,
14,15,15,,,,Walk in or through other agency referral.,Adult alcoholic/drug addictive men and women w...,Provides a long-term (6-12 month) residential ...,"Age 21-60, detoxed, physically able and willin...",,...,,"ALCOHOLISM SERVICES, Residential Care, DRUG AB...",,Adult Rehabilitation Center,,"Alameda County, San Mateo County",active,Varies according to available beds for men and...,,


Checking data type of data frame whether the column is int, float, etc

In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 22 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       23 non-null     int64  
 1   location_id              23 non-null     int64  
 2   program_id               0 non-null      float64
 3   accepted_payments        1 non-null      object 
 4   alternate_name           1 non-null      object 
 5   application_process      23 non-null     object 
 6   audience                 14 non-null     object 
 7   description              23 non-null     object 
 8   eligibility              17 non-null     object 
 9   email                    1 non-null      object 
 10  fees                     21 non-null     object 
 11  funding_sources          21 non-null     object 
 12  interpretation_services  1 non-null      object 
 13  keywords                 21 non-null     object 
 14  languages                1 n

In [43]:
df.dtypes

id                           int64
location_id                  int64
program_id                 float64
accepted_payments           object
alternate_name              object
application_process         object
audience                    object
description                 object
eligibility                 object
email                       object
fees                        object
funding_sources             object
interpretation_services     object
keywords                    object
languages                   object
name                        object
required_documents          object
service_areas               object
status                      object
wait_time                   object
website                     object
taxonomy_ids                object
dtype: object

In [44]:
df['id']

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
18    19
19    20
20    21
21    22
22    23
Name: id, dtype: int64

In [45]:
df['application_process']  # str >> Object

0                            Walk in or apply by phone.
1                    Apply by phone for an appointment.
2           Phone for information (403-4300 Ext. 4322).
3                                       Apply by phone.
4                                Phone for information.
5     Walk in or apply by phone for membership appli...
6     Apply by phone or be referred by a doctor, soc...
7                                       Apply by phone.
8     Walk in. Proof of residency in California requ...
9     Walk in. Proof of California residency to rece...
10    Walk in. Proof of California residency require...
11    Walk in or apply by phone, email or webpage re...
12    Walk in. Proof of California residency require...
13    Call for appointment. Referral from human serv...
14            Walk in or through other agency referral.
15    Walk in. Written application, identification r...
16                                Call for information.
17    Call for screening appointment. Medical vi

In [46]:
list(df['application_process'])

['Walk in or apply by phone.',
 'Apply by phone for an appointment.',
 'Phone for information (403-4300 Ext. 4322).',
 'Apply by phone.',
 'Phone for information.',
 'Walk in or apply by phone for membership application.',
 'Apply by phone or be referred by a doctor, social worker or other professional. All prospective participants are interviewed individually before starting the program. A recent physical examination is required, including a TB test.',
 'Apply by phone.',
 'Walk in. Proof of residency in California required to receive a library card.',
 'Walk in. Proof of California residency to receive a library card.',
 'Walk in. Proof of California residency required to receive a library card.',
 'Walk in or apply by phone, email or webpage registration.',
 'Walk in. Proof of California residency required to receive a library card.',
 'Call for appointment. Referral from human service professional preferred for emergency assistance.',
 'Walk in or through other agency referral.',
 

### Converting Series to a DataFrame

In [47]:
s = pd.Series([2,3,4], index=[100, "Anu", 200])

In [48]:
s

100    2
Anu    3
200    4
dtype: int64

In [49]:
d = pd.DataFrame(s)

In [50]:
d

Unnamed: 0,0
100,2
Anu,3
200,4


In [51]:
type(d)

pandas.core.frame.DataFrame

### Inserting New Column to existing data frame

In [52]:
d["New Column" ] = "Anu"

In [53]:
d

Unnamed: 0,0,New Column
100,2,Anu
Anu,3,Anu
200,4,Anu


In [54]:
d["New Column"]

100    Anu
Anu    Anu
200    Anu
Name: New Column, dtype: object

In [55]:
d["New Col 1"]= [1,2,3]

In [56]:
d

Unnamed: 0,0,New Column,New Col 1
100,2,Anu,1
Anu,3,Anu,2
200,4,Anu,3


### Changing Col Name

In [57]:
d.columns

Index([0, 'New Column', 'New Col 1'], dtype='object')

In [58]:
d.columns = ["Col1", "Col2", "Col3"]

In [59]:
d

Unnamed: 0,Col1,Col2,Col3
100,2,Anu,1
Anu,3,Anu,2
200,4,Anu,3


### Resetting the Index

In [60]:
d

Unnamed: 0,Col1,Col2,Col3
100,2,Anu,1
Anu,3,Anu,2
200,4,Anu,3


In [61]:
d.reset_index()

Unnamed: 0,index,Col1,Col2,Col3
0,100,2,Anu,1
1,Anu,3,Anu,2
2,200,4,Anu,3


In [62]:
d.reset_index(drop=True)

Unnamed: 0,Col1,Col2,Col3
0,2,Anu,1
1,3,Anu,2
2,4,Anu,3


### Accessing One Col

- Series are 1-D in nature. Since every columns are series so Column also behaves like 1-D >>>> We can not call two col at once >> Throw an error >>> make it 2-d

In [63]:
df["status"]

0       active
1       active
2       active
3       active
4       active
5       active
6       active
7       active
8       active
9       active
10      active
11      active
12      active
13      active
14      active
15      active
16      active
17      active
18      active
19     defunct
20    inactive
21      active
22      active
Name: status, dtype: object

### Accessing two columns.
In order to access two columns we have :

`dataframe[["col1", "Col2"]]`

It become 2-D >>> series changed to dataframe

In [64]:
df[["status", "languages"]]

Unnamed: 0,status,languages
0,active,
1,active,
2,active,
3,active,
4,active,
5,active,
6,active,
7,active,
8,active,
9,active,


In [65]:
df_subset = df[["name", "status", "languages"]]
df_subset

Unnamed: 0,name,status,languages
0,Fair Oaks Adult Activity Center,active,
1,Second Career Employment Program,active,
2,Senior Peer Counseling,active,
3,Family Visitation Center,active,
4,Economic Self-Sufficiency Program,active,
5,Little House Recreational Activities,active,
6,Rosener House Adult Day Services,active,
7,Meals on Wheels - South County,active,
8,Fair Oaks Branch,active,
9,Main Library,active,


In [66]:
type(df_subset)

pandas.core.frame.DataFrame

# Reading an Excel File

In [67]:
!pip install openpyxl

Defaulting to user installation because normal site-packages is not writeable
Collecting openpyxl
  Using cached openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Using cached et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Using cached openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
Using cached et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl

   ---------------------------------------- 0/2 [et-xmlfile]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- -----

In [68]:
df1 = pd.read_excel("excel.xlsx")
df1

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

In [None]:
df1.shape

(20, 20)

In [None]:
df1.columns

Index(['Int_Col_1', 'Int_Col_2', 'Int_Col_3', 'Int_Col_4', 'Int_Col_5',
       'Float_Col_1', 'Float_Col_2', 'Float_Col_3', 'Float_Col_4',
       'Float_Col_5', 'String_Col_1', 'String_Col_2', 'String_Col_3',
       'String_Col_4', 'String_Col_5', 'Feedback_Col_1', 'Feedback_Col_2',
       'Feedback_Col_3', 'Feedback_Col_4', 'Feedback_Col_5'],
      dtype='object')

In [None]:
df1.dtypes

Int_Col_1           int64
Int_Col_2           int64
Int_Col_3           int64
Int_Col_4           int64
Int_Col_5           int64
Float_Col_1       float64
Float_Col_2       float64
Float_Col_3       float64
Float_Col_4       float64
Float_Col_5       float64
String_Col_1       object
String_Col_2       object
String_Col_3       object
String_Col_4       object
String_Col_5       object
Feedback_Col_1     object
Feedback_Col_2     object
Feedback_Col_3     object
Feedback_Col_4     object
Feedback_Col_5     object
dtype: object

## Reading CSV file from GitHub/Drive Link

In [None]:
df3 = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/master/doc/data/titanic.csv")

In [None]:
df3

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [None]:
df3.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [None]:
df3.info

<bound method DataFrame.info of      PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                                Heikkinen, Miss Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                   

In [None]:
df3.shape

(891, 12)

In [None]:
df3.Sex

0        male
1      female
2      female
3      female
4        male
        ...  
886      male
887    female
888    female
889      male
890      male
Name: Sex, Length: 891, dtype: object

In [None]:
df3.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [None]:
df3[["Sex", "Fare", "Cabin"]]

Unnamed: 0,Sex,Fare,Cabin
0,male,7.2500,
1,female,71.2833,C85
2,female,7.9250,
3,female,53.1000,C123
4,male,8.0500,
...,...,...,...
886,male,13.0000,
887,female,30.0000,B42
888,female,23.4500,
889,male,30.0000,C148


### Reading HTML file

In [None]:
!pip install lxml html5lib beautifulsoup4


Collecting lxml
  Downloading lxml-6.0.0-cp313-cp313-win_amd64.whl.metadata (6.8 kB)
Collecting html5lib
  Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)
Downloading lxml-6.0.0-cp313-cp313-win_amd64.whl (4.0 MB)
   ---------------------------------------- 0.0/4.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/4.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/4.0 MB ? eta -:--:--
   -- ------------------------------------- 0.3/4.0 MB ? eta -:--:--
   -- ------------------------------------- 0.3/4.0 MB ? eta -:--:--
   ------- -------------------------------- 0.8/4.0 MB 893.2 kB/s eta 0:00:04
   ---------- ----------------------------- 1.0/4.0 MB 976.2 kB/s eta 0:00:04
   ---------- ----------------------------- 1.0/4.0 MB 976.2 kB/s eta 0:00:04
   ------------- -------------------------- 1.3/4.0 MB 982.0 kB/s eta 0:00:03
   --------------- ------------------------ 1.6/4.0 MB 887.7 kB/s eta 0:00:03
   --------------- ------------------


[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
url_list = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2019_totals.html")

In [None]:
url_list

[        Rk          Player   Age Team  Pos     G    GS      MP     FG     FGA  \
 0      1.0    James Harden  29.0  HOU   PG  78.0  78.0  2867.0  843.0  1909.0   
 1      2.0     Paul George  28.0  OKC   SF  77.0  77.0  2841.0  707.0  1614.0   
 2      3.0    Kemba Walker  28.0  CHO   PG  82.0  82.0  2863.0  731.0  1684.0   
 3      4.0    Bradley Beal  25.0  WAS   SG  82.0  82.0  3028.0  764.0  1609.0   
 4      5.0  Damian Lillard  28.0  POR   PG  80.0  80.0  2838.0  681.0  1533.0   
 ..     ...             ...   ...  ...  ...   ...   ...     ...    ...     ...   
 704  527.0     Zach Lofton  26.0  DET   SG   1.0   0.0     4.0    0.0     1.0   
 705  528.0    Kobi Simmons  21.0  CLE   PG   1.0   0.0     2.0    0.0     0.0   
 706  529.0      Tyler Ulis  23.0  CHI   PG   1.0   0.0     1.0    0.0     0.0   
 707  530.0     Okaro White  26.0  WAS   PF   3.0   0.0     6.0    0.0     2.0   
 708    NaN  League Average   NaN  NaN  NaN   NaN   NaN     NaN    NaN     NaN   
 
      ...    D

In [None]:
type(url_list)

list

To change the list into dataframe :

In [None]:
df4 = url_list[0]
df4

Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,MP,FG,FGA,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Trp-Dbl,Awards
0,1.0,James Harden,29.0,HOU,PG,78.0,78.0,2867.0,843.0,1909.0,...,452.0,518.0,586.0,158.0,58.0,387.0,244.0,2818.0,7.0,"MVP-2,AS,NBA1"
1,2.0,Paul George,28.0,OKC,SF,77.0,77.0,2841.0,707.0,1614.0,...,523.0,628.0,318.0,170.0,34.0,205.0,214.0,2159.0,1.0,"MVP-3,DPOY-3,AS,NBA1,DEF1"
2,3.0,Kemba Walker,28.0,CHO,PG,82.0,82.0,2863.0,731.0,1684.0,...,309.0,361.0,484.0,102.0,34.0,211.0,131.0,2102.0,0.0,"AS,NBA3"
3,4.0,Bradley Beal,25.0,WAS,SG,82.0,82.0,3028.0,764.0,1609.0,...,322.0,411.0,448.0,121.0,58.0,224.0,226.0,2099.0,2.0,AS
4,5.0,Damian Lillard,28.0,POR,PG,80.0,80.0,2838.0,681.0,1533.0,...,303.0,371.0,551.0,88.0,34.0,212.0,148.0,2067.0,0.0,"MVP-6,AS,NBA2"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
704,527.0,Zach Lofton,26.0,DET,SG,1.0,0.0,4.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,
705,528.0,Kobi Simmons,21.0,CLE,PG,1.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
706,529.0,Tyler Ulis,23.0,CHI,PG,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
707,530.0,Okaro White,26.0,WAS,PF,3.0,0.0,6.0,0.0,2.0,...,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [None]:
type(df4)

pandas.core.frame.DataFrame

In [None]:
df4.shape

(709, 32)

In [None]:
df4.size

22688

In [None]:
df4.columns

Index(['Rk', 'Player', 'Age', 'Team', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA',
       'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA',
       'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS',
       'Trp-Dbl', 'Awards'],
      dtype='object')

In [None]:
df4.head()

Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,MP,FG,FGA,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Trp-Dbl,Awards
0,1.0,James Harden,29.0,HOU,PG,78.0,78.0,2867.0,843.0,1909.0,...,452.0,518.0,586.0,158.0,58.0,387.0,244.0,2818.0,7.0,"MVP-2,AS,NBA1"
1,2.0,Paul George,28.0,OKC,SF,77.0,77.0,2841.0,707.0,1614.0,...,523.0,628.0,318.0,170.0,34.0,205.0,214.0,2159.0,1.0,"MVP-3,DPOY-3,AS,NBA1,DEF1"
2,3.0,Kemba Walker,28.0,CHO,PG,82.0,82.0,2863.0,731.0,1684.0,...,309.0,361.0,484.0,102.0,34.0,211.0,131.0,2102.0,0.0,"AS,NBA3"
3,4.0,Bradley Beal,25.0,WAS,SG,82.0,82.0,3028.0,764.0,1609.0,...,322.0,411.0,448.0,121.0,58.0,224.0,226.0,2099.0,2.0,AS
4,5.0,Damian Lillard,28.0,POR,PG,80.0,80.0,2838.0,681.0,1533.0,...,303.0,371.0,551.0,88.0,34.0,212.0,148.0,2067.0,0.0,"MVP-6,AS,NBA2"


### Saving this dataframe to local machine by converting into CSV file

- Default index is there in dataframe
- to remove index use `index=False`

In [None]:
df4.to_csv("player.csv")

### Reading data from URL

In [None]:
url = "https://api.github.com/repos/hadley/ggplot2/issues"

In [None]:
pd.read_json(url)

Unnamed: 0,url,repository_url,labels_url,comments_url,events_url,html_url,id,node_id,number,title,...,active_lock_reason,sub_issues_summary,body,closed_by,reactions,timeline_url,performed_via_github_app,state_reason,draft,pull_request
0,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6568,3293130750,I_kwDNS-7OxEkv_g,6568,bug(API): Missing exported `build_ggplot()`,...,,"{'total': 0, 'completed': 0, 'percent_complete...",Related: \n\n> Had to do a small consession be...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
1,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/pull/6566,3288448462,PR_kwDNS-7OofOJJA,6566,v4.0.0,...,,,Fix #6565,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,0.0,{'url': 'https://api.github.com/repos/tidyvers...
2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6565,3288414732,I_kwDNS-7OxAE6DA,6565,Release ggplot2 4.0.0,...,,"{'total': 0, 'completed': 0, 'percent_complete...",Prepare for release:\n\n* [x] `git pull`\n* [x...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
3,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6562,3263154068,I_kwDNS-7Own_HlA,6562,geom_jitter fails when coordinate values have ...,...,,"{'total': 0, 'completed': 0, 'percent_complete...",When geom_jitter is given some data where the ...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
4,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6559,3245700751,I_kwDNS-7OwXV2jw,6559,expose `stat` argument in geom_vline and geom_...,...,,"{'total': 0, 'completed': 0, 'percent_complete...",I think it would be nice to expose the stat ar...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
5,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/pull/6558,3244273485,PR_kwDNS-7On6HETQ,6558,Revdepcheck on main,...,,,,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,1.0,{'url': 'https://api.github.com/repos/tidyvers...
6,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/pull/6557,3242126133,PR_kwDNS-7On4Rkxw,6557,fix: ensure S7 constructor matches class name ...,...,,,Fix partial: https://github.com/tidyverse/ggpl...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,0.0,{'url': 'https://api.github.com/repos/tidyvers...
7,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6555,3235043492,I_kwDNS-7OwNLYpA,6555,Feature request: `position_stack()` alignment,...,,"{'total': 0, 'completed': 0, 'percent_complete...","Currently, `position_stack()` bottom-aligns (p...",,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
8,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6554,3234337883,I_kwDNS-7OwMgUWw,6554,Plot boundary calculated incorrectly with mult...,...,,"{'total': 0, 'completed': 0, 'percent_complete...",When building a plot with an automatically-wra...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,
9,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://api.github.com/repos/tidyverse/ggplot2...,https://github.com/tidyverse/ggplot2/issues/6553,3229186266,I_kwDNS-7OwHl42g,6553,add stat_manual example,...,,"{'total': 0, 'completed': 0, 'percent_complete...",I think it would be nice to add a dplyr exampl...,,{'url': 'https://api.github.com/repos/tidyvers...,https://api.github.com/repos/tidyverse/ggplot2...,,,,


#### Another way to get data from URL or reading JSON file

In [None]:
import requests
data = requests.get(url)

In [None]:
data  # response 200 = Successfull

<Response [200]>

In [None]:
data.json()

[{'url': 'https://api.github.com/repos/tidyverse/ggplot2/issues/6568',
  'repository_url': 'https://api.github.com/repos/tidyverse/ggplot2',
  'labels_url': 'https://api.github.com/repos/tidyverse/ggplot2/issues/6568/labels{/name}',
  'comments_url': 'https://api.github.com/repos/tidyverse/ggplot2/issues/6568/comments',
  'events_url': 'https://api.github.com/repos/tidyverse/ggplot2/issues/6568/events',
  'html_url': 'https://github.com/tidyverse/ggplot2/issues/6568',
  'id': 3293130750,
  'node_id': 'I_kwDNS-7OxEkv_g',
  'number': 6568,
  'title': 'bug(API): Missing exported `build_ggplot()`',
  'user': {'login': 'schloerke',
   'id': 93231,
   'node_id': 'MDQ6VXNlcjkzMjMx',
   'avatar_url': 'https://avatars.githubusercontent.com/u/93231?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/schloerke',
   'html_url': 'https://github.com/schloerke',
   'followers_url': 'https://api.github.com/users/schloerke/followers',
   'following_url': 'https://api.github.com/users/sc

In [None]:
!pip install pandas

^C


Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Using cached pandas-2.3.1-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Using cached numpy-2.3.2-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting python-dateutil>=2.8.2 (from pandas)
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Using cached pandas-2.3.1-cp313-cp313-win_amd64.whl (11.0 MB)
Using cached numpy-2.3.2-cp313-cp313-win_amd64.whl (12.8 MB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Using cache



# Some Advance Functions On Pandas

In [69]:
import pandas as pd

In [70]:
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/master/doc/data/titanic.csv")

In [71]:
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [73]:
df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [75]:
df.head()  # 1st five rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [76]:
df.tail()  # last five rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [78]:
df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object