# DataFrame in Pandas:

A DataFrame in Pandas is a two-dimensional labeled data structure that resembles a table or spreadsheet. It consists of rows and columns, where each column can contain different data types (e.g., integers, floats, strings) and is identified by a unique label. DataFrames are highly versatile and widely used for data manipulation, analysis, and visualization tasks in Python.

# Example Usage:

In [1]:
import pandas as pd

# Sample data for DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['London', 'New York', 'Paris', 'Paris', 'Sydney'],
    'Salary': [60000, 75000, 80000, 70000, 65000]
}



In [2]:
# Creating a DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,London,60000
1,Bob,30,New York,75000
2,Charlie,35,Paris,80000
3,David,28,Paris,70000
4,Emma,32,Sydney,65000


**DataFrame Creation:** We create a DataFrame 'df' from the dictionary 'data', containing information about individuals' names, ages, cities, and salaries.

In [3]:
# Basic DataFrame Information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
 3   Salary  5 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 288.0+ bytes


**DataFrame Information:** We use the info() method to display information about the DataFrame, including the data types of each column and memory usage.

In [4]:
df.describe()

Unnamed: 0,Age,Salary
count,5.0,5.0
mean,30.0,70000.0
std,3.807887,7905.69415
min,25.0,60000.0
25%,28.0,65000.0
50%,30.0,70000.0
75%,32.0,75000.0
max,35.0,80000.0


**Summary Statistics:** The describe() method generates summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for numeric columns in the DataFrame.

In [5]:
filtered_df = df[df['Salary'] > 70000]
filtered_df

Unnamed: 0,Name,Age,City,Salary
1,Bob,30,New York,75000
2,Charlie,35,Paris,80000


**Filtering Data:** We filter the DataFrame to include only rows where the salary is greater than 70000.

In [6]:
df['Experience'] = [3, 5, 7, 4, 6]

df

Unnamed: 0,Name,Age,City,Salary,Experience
0,Alice,25,London,60000,3
1,Bob,30,New York,75000,5
2,Charlie,35,Paris,80000,7
3,David,28,Paris,70000,4
4,Emma,32,Sydney,65000,6


**Adding a New Column:** We add a new column 'Experience' to the DataFrame, representing the number of years of work experience for each individual.

In [7]:
avg_salary_by_city = df.groupby('City')['Salary'].mean()
avg_salary_by_city

City
London      60000.0
New York    75000.0
Paris       75000.0
Sydney      65000.0
Name: Salary, dtype: float64

**Grouping and Aggregation:** We group the DataFrame by the 'City' column and calculate the average salary for each city using the groupby() and mean() methods.

Understanding DataFrames is crucial for conducting data analysis and manipulation tasks efficiently in Pandas. They serve as the backbone of many data-related operations, enabling users to work with structured datasets seamlessly.

# Assigment

In this lab exercise, you will explore a dataset containing information about employees in a company using Pandas. You will perform various data manipulation and analysis tasks to gain insights into the employee data.

**Dataset:**
You are provided with a dictionary containing employee information:

In [8]:
data = {
    'EmployeeID': [101, 102, 103, 104, 105],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing', 'Sales'],
    'Salary': [60000, 75000, 80000, 70000, 65000]
}

## Tasks:

**DataFrame Creation:**
- Create a Pandas DataFrame named 'employees' from the provided dictionary 'data'.

**Data Exploration:**
- Display the first 3 rows of the DataFrame to get an overview of the data. (use the head() methode for this)
- Check the summary statistics for the 'Salary' column. (use the describe() methode for this)
- Determine the number of employees in each department.

**Data Analysis:**
- Calculate and display the average salary of employees.
- Identify the employee(s) with the highest salary and their details.
- Determine the department with the lowest average salary.
- Add a new column 'Salary Increase' to the DataFrame, where each employee's salary is increased by 10%.