==================Pandas==================
Pandas is a fast, powerful and flexible open-source library build on top of NumPy, designed to clean, analyze and manipulate the data.
Think of Pandas as Excel + SQL + Python combined.

It lets you:
- Load data (from CSV, Excel, JSON, SQL)
- Clean it (remove nulls, rename columns)
- Analyze it (aggregate, group, sort)
- Visualize or prepare it for ML models.

==================Series==================
A Pandas Series in a 1D labeled array capable of holding any data types (integer, strings, float....)
A Series is like a column in Excel or a single list with labels (indexes).

----------------------------------------------------------------------

import pandas as pd

data = pd.Series([10, 20, 30, 40])
print(data)

Output:
    0    10
    1    20
    2    30
    3    40
    dtype: int64

Here 0, 1, 2, 3 are the indexes.

----------------------------------------------------------------------

Another Example:
marks = pd.Series([85, 92, 78], index=['Math', 'Science', 'English'])
print(marks)

Output:
    Math       85
    Science    92
    English    78
    Name: None, dtype: int64

here Math, Science, English are the indexes (Indexes can be specified manually also as we did in this example)
Rule 1. number of indexes = number of elements specified. 

In [None]:
import pandas as pd

temprature = pd.Series([11, 29, 33, 30, 28], index=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])
# We can inspect the index and values like below
temprature.index
temprature.values

==================DataFrame==================

A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns).
If Series was like a single column, DataFrame = a full table — the real powerhouse for data analysis.

There are multiple ways that a pandas dataframe can be created. 
    1. From a dictionary
    2. From a list of dictionaries
    3. From a NumPy array (should include columns in this method)
    4. Import a CSV as a data frame

Exploring a DataFrame
    - df.head()        # first 5 rows
    - df.tail(2)       # last 2 rows
    - df.info()        # column info and data types
    - df.describe()    # numerical summary stats
    - df.shape         # rows, columns
    - df.columns       # column names
    - df.dtypes        # data types

Accessind the data in DataFrame: 
    - data['col1']              #Selecting one column
    - data[['col1', 'col2']]    #Selecting multiple column

==================loc&iloc==================

When we create a dataframe it automatically generates the row-index, look at the below example. the column 0 1 2 3 gets generated automatically.   
------------------------------------------------------------------------------------------
-Example
    students = pd.DataFrame({
        'Name': ['Arjun', 'John', 'Kavin'],    ||    -Output
        'Age': [20, 22, 21],                   ||         Name  Age  Marks
        'Marks': [85, 92, 88]                  ||     0  Arjun   20     85
    }).sort_values('Age')                      ||     1   John   22     92
                                               ||     2  Kavin   21     88
    print(students)
------------------------------------------------------------------------------------------

So, when we use, 
    -students.loc[] -> Pandas goes searches the exact value here in our case for example if we give .loc[2] it will go and search the value 2 in the index column and return the rwo with index value of 2. 
    -students.iloc[] -> Pandas returns the actual value present in the second row. (More of like how a normal python array indexing works)



Feature	              | .loc	                 | .iloc
-------------------------------------------------------------------------------
Indexing type	      | Label-based	             | Position-based
Multiple rows/columns |	Pass a list of labels	 | Pass a list of positions
Order	              | Preserved	             | Preserved
Duplicates	          | Allowed	                 | Allowed
Slice vs Fancy Index  |	Slice includes end label | Slice excludes end position

In [None]:
students = pd.DataFrame({
    'Name': ['Arjun', 'John', 'Kavin'],
    'Age': [20, 22, 21],
    'Marks': [85, 92, 88]
}).sort_values('Marks')

print(students)

In [None]:
print(students.loc[2])
print(students.iloc[2])
print(students.iloc[0, 2])
print(students.loc[1, 'Marks'])
print(students.loc[0:1])
print(students.iloc[0:1])
print(students.loc[students['Marks'] > 87])
students.loc[students['Marks'] > 93, 'Marks'] += 1  # => This will search for the students with marks abve 88 and add 1 to their marks. 


##FANCY INDEXING: 
#Using Only rows
print(students.iloc[[0, 2]]) 
print(students.loc[[0, 2]])

#Using Rows And Columns
print(students.iloc[[0, 2], [0, 2]]) 
print(students.loc[[0, 2], ['Name', 'Marks']])

==================Data Exploration & Basic Operations==================
1. Inspecting Data
    - head()  
    - tail() 
    - info()
    - describe() 
    - shape
    - columns 
    - dtypes

2. Sorting & Ranking
    - sort_values('row/col name', ascending=True ) 
    - sort_index()
    - rank()  --> df['col_we_need_to_rank'].rank(method='average/min/max/dense/first' #This method arg will handle the tie cases while ranking, ascending=True)

3. Adding & Modifying Columns
    - df[col_want_to_be_added] = df[col_existing] + 1  (Litrellay any logic can go here)  ---> To add a column 
    - df[col_existing] = df[col_existing] + 1  (Litrellay any logic can go here)  ---> To update a column 

4. Droping rows & columns
    - df.drop(labels=None, axis=0, inplace=False)
            -- labels = Row index or column name(s)
            -- axis = 0 → drop rows, 1 → drop columns
            -- inplace = Modify directly or return new df
    
    Ex: 1. students.drop('BonusMarks', axis=1, inplace=True)        #Drop by label
        2. students.drop(['BonusMarks', 'Total'], axis=1)           #Drop by label
        3. students.drop([1, 3], axis=0)                            #Drop by index
    
5. Basic Aggregations
    - sum() 
    - mean() 
    - min() 
    - max() 
    - count()

In [None]:
print(students.head())
print(students.tail(1))
print(students.info())
print(students.describe())
print(students.shape)
print(students.columns)
print(students.dtypes)

In [None]:
data = {
    'Name': ['Asha', 'Bala', 'Chitra', 'Dinesh', 'Esha', 'Farhan', 'Gita'],
    'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance', 'HR', 'Finance'],
    'Age': [29, 35, 31, 42, 28, 39, 30],
    'Salary': [48000, 54000, 50000, 62000, 45000, 58000, 49000],
    'Experience': [4, 8, 6, 12, 3, 9, 5]
}

employees = pd.DataFrame(data)
print(employees)

In [69]:
#1. Sorting.
#a. Sort all employees by their Salary in ascending order.
#b. Sort by Department first, then by Age within each department (ascending).
#c. Sort by Experience in descending order.

sort_salary = employees.sort_values('Salary', ascending=True)
#print(sort_salary)

sort_dept_age = employees.sort_values(['Department', 'Age'], ascending=[True, True])
#print(sort_dept_age)

sort_experince = employees.sort_values('Experience', ascending=False)
#print(sort_experince)

#--------------------------------------------------------------------------------#

#2. Ranking
#
#a. Create a new column called "SalaryRank" that ranks employees based on their salary (highest salary = rank 1).
#b. Then create another column "AgeRank" ranking by age (oldest = rank 1).

employees['SalaryRank'] = employees['Salary'].rank(ascending=False)
employees['AgeRank'] = employees['Age'].rank(ascending=False)

#--------------------------------------------------------------------------------#

#3. Adding & Modifying Columns
#
#a. Add a new column "Bonus" that is 10% of Salary.
#b. Add another column "TotalPay" = Salary + Bonus.
#c. Increase "Salary" by 5% for employees in the "Finance" department.

employees['Bonus'] = employees['Salary'] * 0.10
employees['TotalPay'] = employees['Salary'] + employees['Bonus']
employees.loc[employees['Department'] == 'Finance', 'Salary'] *= 0.05

print(employees)
print('\n')
#--------------------------------------------------------------------------------#

#4. Dropping Rows or Columns
#
#a. Drop the "Experience" column.
#b. Drop the row(s) where "Department" is "HR".

drop_ex = employees.drop('Experience', axis=1)
drop_dep = employees.drop(employees[employees['Department'] == 'HR'].index)


     Name Department  Age        Salary  Experience  SalaryRank  AgeRank  \
0    Asha         HR   29  48000.000000           4         4.0      6.0   
1    Bala    Finance   35      0.000002           8         5.0      3.0   
2  Chitra         IT   31  50000.000000           6         3.0      4.0   
3  Dinesh         IT   42  62000.000000          12         1.0      1.0   
4    Esha    Finance   28      0.000002           3         7.0      7.0   
5  Farhan         HR   39  58000.000000           9         2.0      2.0   
6    Gita    Finance   30      0.000002           5         6.0      5.0   

         Bonus      TotalPay  
0  4800.000000  52800.000000  
1     0.000004      0.000046  
2  5000.000000  55000.000000  
3  6200.000000  68200.000000  
4     0.000004      0.000039  
5  5800.000000  63800.000000  
6     0.000004      0.000042  




In [71]:
import pandas as pd

data = {
    'Name': ['Asha', 'Bala', 'Chitra', 'Dinesh', 'Esha', 'Farhan', 'Gita', 'Hemanth', 'Isha', 'Jai'],
    'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance', 'HR', 'Finance', 'IT', 'Finance', 'HR'],
    'Age': [29, 35, 31, 42, 28, 39, 30, 33, 26, 41],
    'Salary': [48000, 54000, 50000, 62000, 45000, 58000, 49000, 52000, 43000, 61000],
    'Experience': [4, 8, 6, 12, 3, 9, 5, 7, 2, 10]
}

hr_db = pd.DataFrame(data)
hr_db

Unnamed: 0,Name,Department,Age,Salary,Experience
0,Asha,HR,29,48000,4
1,Bala,Finance,35,54000,8
2,Chitra,IT,31,50000,6
3,Dinesh,IT,42,62000,12
4,Esha,Finance,28,45000,3
5,Farhan,HR,39,58000,9
6,Gita,Finance,30,49000,5
7,Hemanth,IT,33,52000,7
8,Isha,Finance,26,43000,2
9,Jai,HR,41,61000,10


In [132]:
#1. Salary & Department Insights
#   - Sort the DataFrame by Department first, then by Salary (descending).
#   - Display the top 2 highest-paid employees per department.


sorted_hr = hr_db.sort_values(['Department', 'Salary'], ascending=[False, False])
#sorted_hr

top2_emp = hr_db.sort_values('Salary', ascending=False).groupby('Department').head(2)
#top2_emp


#2. Rank & Filter
#   - Add a column SalaryRank → rank employees based on salary (highest = rank 1).
#   - Add a column ExperienceRank → rank employees based on experience (highest = rank 1).
#   - Create a new column OverallRank = average of both ranks (lower = better overall performer).
#   - Display only the top 5 overall performers.

hr_db['SalaryRank'] = hr_db['Salary'].rank(ascending=False)
hr_db['ExperienceRank'] = hr_db['Experience'].rank(ascending=False)
hr_db['OverallRank'] = (hr_db['SalaryRank'] + hr_db['ExperienceRank'] / 2).rank(ascending=True)
overall_performer = hr_db.sort_values('OverallRank').head(5)
overall_performer

#print(hr_db)

#3. Reward Top Employees
#   - Add a column Bonus:
#       - If OverallRank <= 3, give 15% of salary as bonus.
#       - Else give 7% of salary as bonus.
#   - Add a new column TotalPay = Salary + Bonus.

hr_db['Bonus'] = 0.0

hr_db.loc[hr_db['OverallRank'] <= 3, 'Bonus'] = hr_db['Salary'] * 0.15
hr_db.loc[hr_db['OverallRank'] > 3, 'Bonus'] = hr_db['Salary'] * 0.07

hr_db['TotalPay'] = hr_db['Salary'] + hr_db['Bonus']

#4. Drop & Clean Data
#   - Drop any employees with less than 3 years of experience.
#   - Drop the ExperienceRank column once you’re done.

hr_db.drop(hr_db[hr_db['Experience'] < 3].index, inplace=True)
hr_db.drop('ExperienceRank', axis=1, inplace=True)

#5. (Bonus: Optional) Department Summary
#   - Generate a quick summary:
#       - Average Salary per Department
#       - Average Bonus per Department
#       - Count of Employees per Department

print(hr_db.groupby('Department').agg(
    avg_sal=('Salary','mean'),
    avg_bonus=('Bonus', 'mean'),
    count_of_emp=('Name', 'count')
))

                 avg_sal    avg_bonus  count_of_emp
Department                                         
Finance     49333.333333  4893.333333             3
HR          21950.000000  1536.500000             3
IT          37100.000000  5317.000000             3
