<a href="https://colab.research.google.com/github/sanjeevan-nxtworx/Pandas/blob/main/07_DataFrame_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###**Operations on a Pandas DataFrame**

In [1]:
import pandas as pd

### Type of Operations

**Operations	------------------------------- Example Operations**
* Basic Operations -------------------------	.shape(), .info(), .describe()
* Selection & Indexing ---------------------	.loc[], .iloc[], selecting columns
* Filtering	 -------------------------------  df[df['Age'] > 30]
* Aggregation & Grouping	------------------  .sum(), .mean(), .groupby()
* Sorting	----------------------------------  .sort_values(by='Age')
* Joining & Merging	------------------------   pd.concat(), pd.merge()
* Handling Missing Data	--------------------   .isnull(), .dropna(), .fillna()
* Transformation	--------------------------   .rename(), .astype(), .apply()
* Statistical Analysis	--------------------   .mean(), .std(), .max(), .min()
* File I/O	--------------------------------   .to_csv(), .read_csv(), .to_json()


##Selection and Indexing

In [2]:
df = pd.read_csv("googl_data_2020_2025.csv")
df

Unnamed: 0,Date,Adj Close,Close,High,Low,Open,Volume
0,2020-01-02 00:00:00+00:00,68.186821,68.433998,68.433998,67.324501,67.420502,27278000
1,2020-01-03 00:00:00+00:00,67.830101,68.075996,68.687500,67.365997,67.400002,23408000
2,2020-01-06 00:00:00+00:00,69.638054,69.890503,69.916000,67.550003,67.581497,46768000
3,2020-01-07 00:00:00+00:00,69.503548,69.755501,70.175003,69.578003,70.023003,34330000
4,2020-01-08 00:00:00+00:00,69.998253,70.251999,70.592499,69.631500,69.740997,35314000
...,...,...,...,...,...,...,...
1253,2024-12-24 00:00:00+00:00,196.110001,196.110001,196.110001,193.779999,194.839996,10403300
1254,2024-12-26 00:00:00+00:00,195.600006,195.600006,196.750000,194.380005,195.149994,12046600
1255,2024-12-27 00:00:00+00:00,192.759995,192.759995,195.320007,190.649994,194.949997,18891400
1256,2024-12-30 00:00:00+00:00,191.240005,191.240005,192.550003,189.119995,189.800003,14264700


###Selecting a Single Column

In [4]:
adjSeries = df['Adj Close']
adjSeries.dtype
adjSeries

Unnamed: 0,Adj Close
0,68.186821
1,67.830101
2,69.638054
3,69.503548
4,69.998253
...,...
1253,196.110001
1254,195.600006
1255,192.759995
1256,191.240005


###Selecting Multiple Columns

In [7]:
print(df[['High', 'Low']].head())  # Multiple columns

        High        Low
0  68.433998  67.324501
1  68.687500  67.365997
2  69.916000  67.550003
3  70.175003  69.578003
4  70.592499  69.631500


### Using loc and iloc

**Feature	-------.loc[] --------------------	.iloc[]**
* Selection Type --	Label-based selection	------ Position-based selection
* Indexing	--------Uses row/column labels ------	Uses row/column integer positions
* Inclusive Range	-- Both start and end labels are included	---- End index is excluded (like Python slicing)
* Usage with Slices	----Works with explicit labels	---- Works with integer-based ranges
* Error Handling	---- Raises an error if a label is missing	----Does not raise an error unless out of range


In [8]:
# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

dfSample = pd.DataFrame(data, index=['A', 'B', 'C'])


###Label-based Selection

In [10]:
# Select a row by label
print(dfSample.loc['A'])


Name      Alice
Age          25
Salary    50000
Name: A, dtype: object


In [11]:
# Select multiple rows using labels
print(dfSample.loc[['A', 'C']])


      Name  Age  Salary
A    Alice   25   50000
C  Charlie   35   70000


In [12]:
# Select a range of rows (inclusive)
print(dfSample.loc['A':'C'])



      Name  Age  Salary
A    Alice   25   50000
B      Bob   30   60000
C  Charlie   35   70000


In [14]:
# Select specific columns
print(dfSample.loc[:, ['Name', 'Salary']])


      Name  Salary
A    Alice   50000
B      Bob   60000
C  Charlie   70000


In [24]:
# Select specific Rows and columns
print(dfSample.loc[['A'], ['Name', 'Salary']])

    Name  Salary
A  Alice   50000


###Position-based Selection

In [27]:
# Select a row by position
print(dfSample.iloc[0])


Name      Alice
Age          25
Salary    50000
Name: A, dtype: object


In [28]:
# Select multiple rows using integer positions
print(dfSample.iloc[[0, 2]])


      Name  Age  Salary
A    Alice   25   50000
C  Charlie   35   70000


In [30]:
# Select a range of rows (end index is **excluded**)
print(dfSample.iloc[0:2])  # Only rows 0 and 1 are selected


    Name  Age  Salary
A  Alice   25   50000
B    Bob   30   60000


In [31]:
# Select specific columns by position
print(dfSample.iloc[:, [0, 2]])


      Name  Salary
A    Alice   50000
B      Bob   60000
C  Charlie   70000


###Filtering & Conditional Selection