**English:** Import pandas for data manipulation and numpy for numerical operations.
**Hindi:** Pandas ko data manipulation ke liye aur numpy ko numerical operations ke liye import karein.

In [1]:
import pandas as pd
import numpy as np

**English:** Create a dictionary containing employee data with some missing values, then convert it into a pandas DataFrame.
**Hindi:** Kuchh missing values ke saath employee data wala ek dictionary banayein, fir use pandas DataFrame mein convert karein.

In [2]:
data = {
    'Name': ['Amit', 'Priya', 'Rohan', 'Sneha', np.nan, 'Tina', 'Arjun', 'Neha'],
    'Department': ['HR', 'IT', 'Finance', np.nan, 'Finance', 'HR', 'IT', 'Finance'],
    'Age': [25, np.nan, 35, 28, 32, 41, 30, np.nan],
    'Salary': [35000, 60000, np.nan, 58000, 72000, 40000, 62000, 67000],
    'City': ['Pune', 'Mumbai', 'Delhi', 'Pune', 'Chennai', np.nan, 'Delhi', 'Pune']
}

df = pd.DataFrame(data)


**English:** Count the number of missing (NaN) values in each column of the DataFrame.
**Hindi:** DataFrame ke har column mein missing (NaN) values ki sankhya ginein.

In [3]:
df.isna().sum()

Name          1
Department    1
Age           2
Salary        1
City          1
dtype: int64

**English:** Display the current state of the DataFrame.
**Hindi:** DataFrame ki vartaman sthiti dikhayein.

In [4]:
df

Unnamed: 0,Name,Department,Age,Salary,City
0,Amit,HR,25.0,35000.0,Pune
1,Priya,IT,,60000.0,Mumbai
2,Rohan,Finance,35.0,,Delhi
3,Sneha,,28.0,58000.0,Pune
4,,Finance,32.0,72000.0,Chennai
5,Tina,HR,41.0,40000.0,
6,Arjun,IT,30.0,62000.0,Delhi
7,Neha,Finance,,67000.0,Pune


**English:** Fill missing 'Age' values with the mean of the existing ages.
**Hindi:** Missing 'Age' values ko maujooda umar ke mean se bharein.

In [5]:
df['Age'].fillna(df['Age'].mean(),inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(),inplace = True)


**English:** Convert the 'Age' column to integer type.
**Hindi:** 'Age' column ko integer type mein convert karein.

In [6]:
df['Age'] = df['Age'].astype(int)

**English:** Fill missing 'Department' values with 'unknown' and missing 'City' values with 'not mentioned'.
**Hindi:** Missing 'Department' values ko 'unknown' aur missing 'City' values ko 'not mentioned' se bharein.

In [7]:
df['Department'].fillna('unknown',inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Department'].fillna('unknown',inplace = True)


In [8]:
df['City'].fillna('not mentioned', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['City'].fillna('not mentioned', inplace = True)


**English:** Display the DataFrame after filling some of the missing values.
**Hindi:** Kuchh missing values bharne ke baad DataFrame dikhayein.

In [9]:
df

Bad pipe message: %s [b' 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Sa']
Bad pipe message: %s [b'ri/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/', b'ng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nA']
Bad pipe message: %s [b'ept-Language: en-US,en;q=0.9,hi;q=0.8,mr;q=0.7\r\nPriority: u=0, i\r\nReferer: https://studio.firebase', b'oogle.com/\r\nSec-Ch-Ua: "Chromium";v="142", "G', b'gle Chrome";v="142", "Not_A Brand";v="99"\r\nSec-Ch-Ua-Arch: "x86"\r\nSec-Ch-Ua-Bitness: "64"\r\nSec-Ch-Ua-Form-Fact', b's: "Desktop"\r\nSec-Ch-Ua-Full-Version: "142.0.7444.60"\r\nSec-Ch-Ua-Full-Version-List: "Chromium";v="142.0.7444.6']
Bad pipe message: %s [b', "Google Chrome";v="142.0.7444.60", "Not_A Bra']
Bad pipe message: %s [b'";v="99.0.0.0"\r\nSec-Ch-Ua-Mobile: ?0\r\nSec-Ch-Ua-Model: ""\r\nSec-Ch-Ua-Platform: "Windows"\r\nSec-Ch-Ua-Platform-']
Bad pipe message: %s [

Unnamed: 0,Name,Department,Age,Salary,City
0,Amit,HR,25,35000.0,Pune
1,Priya,IT,31,60000.0,Mumbai
2,Rohan,Finance,35,,Delhi
3,Sneha,unknown,28,58000.0,Pune
4,,Finance,32,72000.0,Chennai
5,Tina,HR,41,40000.0,not mentioned
6,Arjun,IT,30,62000.0,Delhi
7,Neha,Finance,31,67000.0,Pune


**English:** Identify and display the rows that still contain any missing values.
**Hindi:** Un rows ko pehchanein aur dikhayein jinmein abhi bhi koi missing value hai.

In [10]:
rows_with_missing = df[df.isnull().any(axis=1)]

In [11]:
rows_with_missing

Unnamed: 0,Name,Department,Age,Salary,City
2,Rohan,Finance,35,,Delhi
4,,Finance,32,72000.0,Chennai


**English:** For each row with missing values, print the index and the name of the columns that are missing.
**Hindi:** Missing values wali har row ke liye, index aur un columns ke naam print karein jo missing hain.

In [12]:
for idx in rows_with_missing.index:
    missing_cols = df.loc[idx].isnull()
    missing_col_names = missing_cols[missing_cols].index.tolist()
    print(f'Row {idx}: Missing in columns {missing_col_names}')

Row 2: Missing in columns ['Salary']
Row 4: Missing in columns ['Name']


**English:** Group data by 'Department' and calculate the mean salary for each.
**Hindi:** Data ko 'Department' ke anusaar group karein aur har ek ke liye mean salary calculate karein.

In [13]:
df.groupby('Department')['Salary'].mean().round()

Department
Finance    69500.0
HR         37500.0
IT         61000.0
unknown    58000.0
Name: Salary, dtype: float64

**English:** Sort the DataFrame by 'Salary' in descending order.
**Hindi:** DataFrame ko 'Salary' ke adhaar par descending order mein sort karein.

In [14]:
df.sort_values(by='Salary',ascending=False)

Unnamed: 0,Name,Department,Age,Salary,City
4,,Finance,32,72000.0,Chennai
7,Neha,Finance,31,67000.0,Pune
6,Arjun,IT,30,62000.0,Delhi
1,Priya,IT,31,60000.0,Mumbai
3,Sneha,unknown,28,58000.0,Pune
5,Tina,HR,41,40000.0,not mentioned
0,Amit,HR,25,35000.0,Pune
2,Rohan,Finance,35,,Delhi


**English:** Remove rows where the 'Salary' is missing.
**Hindi:** Un rows ko hatayein jahan 'Salary' missing hai.

In [15]:
df= df.dropna(subset=['Salary'])

**English:** Create a new 'category' column based on salary ranges (High, medium, low) using `np.select`.
**Hindi:** `np.select` ka upyog karke salary ranges (High, medium, low) ke aadhar par ek naya 'category' column banayein.

In [16]:
df['category'] = np.select([df['Salary'] > 65000 , 
                (df['Salary'] > 45000)&(df['Salary']<=65000),
                df['Salary']< 45000],
                ['High ','medium','low'],
                default = 'unknown')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['category'] = np.select([df['Salary'] > 65000 ,


**English:** Replace the missing name at index 4 with 'john'. Note: The original index is used, which might be confusing after dropping rows. A better approach would be to use a more robust locator.
**Hindi:** Index 4 par missing naam ko 'john' se replace karein. Dhyaan dein: Yahan original index ka upyog kiya gaya hai, jo rows drop karne ke baad confusing ho sakta hai. Ek behtar tareeka ek adhik majboot locator ka upyog karna hoga.

In [17]:
df.iloc[3,0] = 'john'

**English:** Display the DataFrame with the new 'category' and updated name.
**Hindi:** Nayi 'category' aur updated naam ke saath DataFrame dikhayein.

In [18]:
df

Unnamed: 0,Name,Department,Age,Salary,City,category
0,Amit,HR,25,35000.0,Pune,low
1,Priya,IT,31,60000.0,Mumbai,medium
3,Sneha,unknown,28,58000.0,Pune,medium
4,john,Finance,32,72000.0,Chennai,High
5,Tina,HR,41,40000.0,not mentioned,low
6,Arjun,IT,30,62000.0,Delhi,medium
7,Neha,Finance,31,67000.0,Pune,High


**English:** Count the occurrences of each salary category.
**Hindi:** Har salary category ki occurrences ginein.

In [19]:
df['category'].value_counts()

category
medium    3
low       2
High      2
Name: count, dtype: int64

**English:** Group by 'Name' and 'category' and count the combinations.
**Hindi:** 'Name' aur 'category' ke anusaar group karein aur combinations ko ginein.

In [20]:
df.groupby('Name')['category'].value_counts()

Name   category
Amit   low         1
Arjun  medium      1
Neha   High        1
Priya  medium      1
Sneha  medium      1
Tina   low         1
john   High        1
Name: count, dtype: int64

**English:** Group by 'Department' and 'category' and find the mean salary for each group.
**Hindi:** 'Department' aur 'category' ke anusaar group karein aur har group ke liye mean salary pata karein.

In [21]:
df.groupby(['Department','category'])['Salary'].mean()

Department  category
Finance     High        69500.0
HR          low         37500.0
IT          medium      61000.0
unknown     medium      58000.0
Name: Salary, dtype: float64

**English:** Find the name of the youngest employee.
**Hindi:** Sabse kam umar ke employee ka naam pata karein.

In [22]:
youngest_employee = df['Age'].idxmin()

In [23]:
name = df.loc[youngest_employee,'Name']
name

'Amit'

**English:** Create a new 'Name_length' column with the length of each name.
**Hindi:** Har naam ki lambai ke saath ek naya 'Name_length' column banayein.

In [24]:
df['Name_length'] = df['Name'].str.len()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Name_length'] = df['Name'].str.len()


**English:** Perform multiple aggregations on the 'Department' groups to get the total number of employees, average salary, and minimum age.
**Hindi:** 'Department' groups par multiple aggregations perform karein taaki कुल employees, average salary, aur minimum age prapt ho sake.

In [25]:
df.groupby('Department').agg(total_employees = ('Name','count'),
                                avg_salary = ('Salary','mean'),
                                min_age = ('Age','min'))

Unnamed: 0_level_0,total_employees,avg_salary,min_age
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,2,69500.0,31
HR,2,37500.0,25
IT,2,61000.0,30
unknown,1,58000.0,28


**English:** Convert all city names to uppercase.
**Hindi:** Sabhi shahar ke naamo ko uppercase mein convert karein.

In [26]:
df['City'] = df['City'].str.upper()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['City'] = df['City'].str.upper()


**English:** Display the final DataFrame.
**Hindi:** Final DataFrame dikhayein.

In [27]:
df

Unnamed: 0,Name,Department,Age,Salary,City,category,Name_length
0,Amit,HR,25,35000.0,PUNE,low,4
1,Priya,IT,31,60000.0,MUMBAI,medium,5
3,Sneha,unknown,28,58000.0,PUNE,medium,5
4,john,Finance,32,72000.0,CHENNAI,High,4
5,Tina,HR,41,40000.0,NOT MENTIONED,low,4
6,Arjun,IT,30,62000.0,DELHI,medium,5
7,Neha,Finance,31,67000.0,PUNE,High,4


**English:** Save a subset of the DataFrame (Name, Salary, Age, category) to a new CSV file named 'new_csv_sachin' without the index.
**Hindi:** DataFrame ka ek subset (Name, Salary, Age, category) 'new_csv_sachin' naam ki ek nayi CSV file mein bina index ke save karein.

In [28]:
df[['Name','Salary','Age','category']].to_csv('new_csv_sachin',index = False)