## Pandas is a library for data manipulation and data analysis

In Pandas, DataFrame is a two-dimensional, potentially **heterogeneous** tabular data structure [can hold different data types (integer, float, string, etc.) across different columns] with **labeled axes** (rows and columns) and **size-mutable** (can add or drop rows and columns). that is a table in a database or a sheet in an Excel spreadsheet.

DataFrames can be create from various data structures, like dictionaries, lists, or other DataFrames.

[pandas documentation](https://pandas.pydata.org/docs)

In [2]:
!pip install pandas 

Collecting pandas
  Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ---------- ----------------------------- 2.9/11.5 MB 16.8 MB/s eta 0:00:01
   ---------------------- ----------------- 6.6/11.5 MB 19.2 MB/s eta 0:00:01
   ------------------------------------ --- 10.5/11.5 MB 18.2 MB/s eta 0:00:01
   ---------------------------------------- 11.5/11.5 MB 18.0 MB/s eta 0:00:00
Downloading pytz-2024.2-py2.py3-none-any.whl (508 kB)
Downloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.2.3 pytz-2024.2 tzdata-2024.2


In [None]:
import pandas as pd 

In [4]:
data = { 
    'Name': ['Alice', 'Bob', 'Charlie'], 
    'Age': [25, 30, 35], 
    'City': ['New York', 'San Francisco', 'Los Angeles'] 
} 
df = pd.DataFrame(data)
print(df) # here you can see DataFrame Feature to hold **Heterogeneous Data** 

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles


Viewing Data

In [30]:
df.head()

Unnamed: 0,Name,Age,City,Country,Adult,Skills
0,Alice,25,New York,USA,Yes,Python
1,Bob,30,San Francisco,USA,Yes,Python
2,Charlie,35,Los Angeles,USA,Yes,Python
3,Drake,32,Texas,USA,Yes,Python
4,Fred,37,San Diego,USA,Yes,Python


#### Here you can see DataFrame Feature to hold **two-dimensional** and accessing with help of same

In [23]:
# Get the total count of rows and columns 
total_rows, total_columns = df.shape 
print(f"Total rows: {total_rows}") 
print(f"Total columns: {total_columns}")

Total rows: 7
Total columns: 3


## Reading and Writing Data
we can read file data as dataframe
and after performing operations wwrite it back to some file
```python
df = pd.read_csv('data.csv')
# some operations
df.to_csv('output.csv', index=False)
```

We will start working on created DataFrame to understand features and functionality

#### Here you can see DataFrame Feature to hold **Labeled Axes** and accessing with help of same
### Accessing Column Data 

In [6]:
names = df['Name'] # single column
names

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

In [7]:
column_list = ['Name', 'Age']
names_and_ages = df[column_list] # multiple columns
names_and_ages

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


### Accessing Row Data

**`loc`** is label-based 
Uses labels to select data. Ideal when you have meaningful row and column names.

In [11]:
# Select row with row 1 
row1 = df.loc[1] 
print(row1) 

Name              Bob
Age                30
City    San Francisco
Name: 1, dtype: object


In [12]:
# Select rows with row 1 and 2 
rows12 = df.loc[1:2] 
print(rows12) 

      Name  Age           City
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles


In [14]:
# Select rows with row 1 to 2 and column 'Age' 
value = df.loc[1:2, 'Age'] 
print(value)

1    30
2    35
Name: Age, dtype: int64


**`iloc`** is integer position-based
Uses integer positions to select data. Ideal for purely positional indexing

In [15]:
# Select the second row (index 1) 
row = df.iloc[1] 
print(row) 

Name              Bob
Age                30
City    San Francisco
Name: 1, dtype: object


In [16]:
# Select the first two rows (index 0 and 1) 
rows = df.iloc[0:2] 
print(rows) 

    Name  Age           City
0  Alice   25       New York
1    Bob   30  San Francisco


In [18]:
# Select the value in the second row and second column 
value = df.iloc[1, 1]
print(value)

30


#### Here you can see DataFrame Feature to hold **Size-Mutable** that is to add or drop rows and columns 
## Adding Row
### Using **`loc`** to add row

In [19]:
print(df)

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles


In [20]:
# Adding a new row 
df.loc[len(df)] = ['Drake', 32, 'Texas'] 
print(df)

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
3    Drake   32          Texas


#### Using **`concat()`** for adding multiple rows

In [22]:
new_dataframe = pd.DataFrame(
    { 
        'Name': ['Fred', 'Gwen', 'Harry'], 
        'Age': [37, 24, 26], 
        'City': ['San Diego', 'New Jersey', 'Washington'] 
    }
)
df = pd.concat([df, new_dataframe], ignore_index=True)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles
3,Drake,32,Texas
4,Fred,37,San Diego
5,Gwen,24,New Jersey
6,Harry,26,Washington


## Adding Columns
#### Using **`insert()`**

In [26]:
total_rows, total_columns = df.shape 
new_column_index = total_columns 
column_name = 'Country'
column_default_value = ['USA'] * total_rows

df.insert(new_column_index, column_name, column_default_value)
df

Unnamed: 0,Name,Age,City,Country
0,Alice,25,New York,USA
1,Bob,30,San Francisco,USA
2,Charlie,35,Los Angeles,USA
3,Drake,32,Texas,USA
4,Fred,37,San Diego,USA
5,Gwen,24,New Jersey,USA
6,Harry,26,Washington,USA


#### Using **`assign()`**

In [29]:
# Adding multiple columns using assign 
df = df.assign(Adult=lambda x: ['Yes' if age > 18 else 'No' for age in x['Age']], Skills=['Python']*df.shape[0]) 
print(df)

      Name  Age           City Country Adult  Skills
0    Alice   25       New York     USA   Yes  Python
1      Bob   30  San Francisco     USA   Yes  Python
2  Charlie   35    Los Angeles     USA   Yes  Python
3    Drake   32          Texas     USA   Yes  Python
4     Fred   37      San Diego     USA   Yes  Python
5     Gwen   24     New Jersey     USA   Yes  Python
6    Harry   26     Washington     USA   Yes  Python


Droping Row
Droping Column
Filtering Rows

Descriptive Statistics
Handling Missing Data
Grouping and Aggregating Data
Merging and Joining DataFrames