## 1. File Handling in Python

### 1.1. Opening and Closing Files

- **Opening Files**:
  - Use the `open()` function to open a file.
  - Syntax: `open(filename, mode)`
  - Common modes:
    - `'r'`: Read (default).
    - `'w'`: Write (creates a new file or truncates an existing file).
    - `'a'`: Append (creates a new file or appends to an existing file).
    - `'b'`: Binary mode.
    - `'+'`: Read and write.
- **Closing Files**:
  - Always close files using the `close()` method to free up resources.
  - Alternatively, use the `with` statement for automatic closing.

In [24]:
# Example: Opening and closing a file
file = open('example.txt', 'r')
content = file.read()
file.close()

# Using 'with' statement
with open('example.txt', 'r') as file:
    content = file.read()

### 1.2. Reading from Files

- **Reading Entire Content**:
  - Use `read()` to read the entire content of a file.
- **Reading Line by Line**:
  - Use `readline()` to read one line at a time.
  - Use `readlines()` to read all lines into a list.
- **Example**: Reading a file line by line.

In [25]:
# Reading entire content
with open('example.txt', 'r') as file:
    content = file.read()
    print(content)

Hello, World!
This is an example text file.
It contains multiple lines of text.
We will use this file to test code.


In [26]:
# Reading line by line
with open('example.txt', 'r') as file:
    line = file.readline()
    while line:
        print(line.strip())
        line = file.readline()

Hello, World!
This is an example text file.
It contains multiple lines of text.
We will use this file to test code.


In [27]:
# Reading all lines into a list
with open('example.txt', 'r') as file:
    lines = file.readlines()
    for line in lines:
        print(line.strip())

Hello, World!
This is an example text file.
It contains multiple lines of text.
We will use this file to test code.


### 1.3. Writing to Files

- **Writing Text**:
  - Use `write()` to write a string to a file.
  - Use `writelines()` to write a list of strings.
- **Appending Text**:
  - Use mode `'a'` to append to an existing file.
- **Example**: Writing and appending to a file.

In [28]:
# Writing to a file
with open('output.txt', 'w') as file:
    file.write("Hello, World!\n")
    file.write("This is a new line.\n")

In [29]:
# Appending to a file
with open('output.txt', 'a') as file:
    file.write("This line is appended.\n")

### 1.4. Working with Binary Files

Reading and writing binary data (using `'rb'` and `'wb'` modes) is suitable for working with any files stored in binary format. Here are some examples of files that can be processed this way:

In [30]:
with open('image.jpg', 'rb') as file:
    binary_data = file.read()

In [32]:
with open('copy_image.jpg', 'wb') as file:
    file.write(binary_data)

After reading binary files, there are numerous operations you can perform depending on the type of file and your goals. Below are some common use cases and examples for different types of files:

##### 1.4.1. Images
   - Resize or Modify: Use libraries like `Pillow` (PIL) or `OpenCV` to process images.
   - Convert Formats: Convert between formats (e.g., `.jpg` to `.png`).
   - Extract Metadata: Use libraries like `exifread` to extract metadata (e.g., camera settings).
   - Apply Filters: Apply filters like blur, sharpen, or grayscale.

**Example**: Resizing an Image.

In [None]:
from PIL import Image
import io

# Read binary image data
with open('image.jpg', 'rb') as file:
    binary_data = file.read()

# Convert binary data to an image object
image = Image.open(io.BytesIO(binary_data))

# Resize the image
resized_image = image.resize((200, 200))

# Save the resized image
resized_image.save('resized_image.jpg')

##### 1.4.2. Audio Files
   - Play Audio: Use libraries like `pydub` or `simpleaudio` to play audio.
   - Convert Formats: Convert between formats (e.g., `.mp3` to `.wav`).
   - Extract Metadata: Use libraries like `mutagen` to extract metadata (e.g., artist, album).
   - Analyze Audio: Use libraries like `librosa` for audio analysis (e.g., frequency, tempo).

**Example**: Converting Audio Formats.

In [None]:
from pydub import AudioSegment

# Read binary audio data
with open('audio.mp3', 'rb') as file:
    binary_data = file.read()

# Convert binary data to an AudioSegment object
audio = AudioSegment.from_file(io.BytesIO(binary_data), format="mp3")

# Export to WAV format
audio.export("audio.wav", format="wav")

##### 1.4.3. Video Files
   - Extract Frames: Use libraries like `OpenCV` to extract frames from a video.
   - Convert Formats: Convert between formats (e.g., `.mp4` to `.avi`).
   - Edit Videos: Use libraries like `moviepy` to trim, merge, or add effects.
   - Analyze Videos: Perform object detection or motion analysis.

**Example**: Extracting Frames from a Video.

In [None]:
import cv2
import numpy as np

# Read binary video data
with open('video.mp4', 'rb') as file:
    binary_data = file.read()

# Convert binary data to a NumPy array
video_array = np.frombuffer(binary_data, dtype=np.uint8)

# Decode the video using OpenCV
video = cv2.VideoCapture('video.mp4')
success, frame = video.read()
while success:
    cv2.imwrite("frame.jpg", frame)  # Save each frame
    success, frame = video.read()

##### 1.4.4. Documents
   - Extract Text: Use libraries like `PyPDF2` (for PDFs) or `python-docx` (for Word documents).
   - Convert Formats: Convert between formats (e.g., `.pdf` to `.docx`).
   - Edit Documents: Modify content, add tables, or insert images.

**Example**: Extracting Text from a PDF.

In [None]:
from PyPDF2 import PdfReader

# Read binary PDF data
with open('document.pdf', 'rb') as file:
    binary_data = file.read()

# Convert binary data to a PDF reader object
reader = PdfReader(io.BytesIO(binary_data))

# Extract text from the first page
text = reader.pages[0].extract_text()
print(text)

##### 1.4.5. Archives
   - Extract Files: Use libraries like `zipfile` or `tarfile` to extract files from archives.
   - Create Archives: Compress files into `.zip` or `.tar.gz` formats.
   - List Contents: Inspect the contents of an archive.

**Example**: Extracting Files from a ZIP Archive.

In [None]:
import zipfile

# Read binary ZIP data
with open('archive.zip', 'rb') as file:
    binary_data = file.read()

# Convert binary data to a ZipFile object
with zipfile.ZipFile(io.BytesIO(binary_data)) as zip_ref:
    zip_ref.extractall("extracted_files")

##### Summary:
Once you read binary files, you can:
- Process the data (e.g., resize images, convert formats).
- Analyze the data (e.g., extract metadata, perform statistical analysis).
- Modify the data (e.g., edit documents, apply filters).
- Use the data (e.g., play audio, execute binaries).

The specific operations depend on the file type and your goals. Python provides libraries for almost every type of file and use case, making it a versatile tool for working with binary data.

### 1.5. Handling File Paths

- **Using `os` and `os.path`**:
  - `os.getcwd()`: Get current working directory.
  - `os.path.join()`: Join path components.
  - `os.path.exists()`: Check if a path exists.
- **Example**: Working with file paths.

In [33]:
import os

# Getting current working directory
current_dir = os.getcwd()
print(f"Current Directory: {current_dir}")

# Joining paths
file_path = os.path.join(current_dir, 'data', 'example.txt')
print(f"File Path: {file_path}")

# Checking if a path exists
if os.path.exists(file_path):
    print("File exists.")
else:
    print("File does not exist.")

Current Directory: c:\GR\Work\СПбГУ\c
File Path: c:\GR\Work\СПбГУ\c\data\example.txt
File does not exist.


### 1.6. Error Handling in File Operations

- **Handling Exceptions**:
  - Use `try-except` blocks to handle file-related errors.
- **Example**: Handling file not found error.

In [34]:
try:
    with open('nonexistent.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("File not found.")
except IOError as e:
    print(f"An error occurred: {e}")

File not found.


### 1.7. Practical Example: Reading and Processing a CSV File

- **Reading a CSV File**:
  - Use `csv` module to read CSV files.
- **Example**: Reading and processing a CSV file.

In [35]:
import csv

# Reading a CSV file
with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)
    data = [row for row in reader]

# Displaying the header and first few rows
print(f"Header: {header}")
for row in data[:5]:
    print(row)

Header: ['Name', 'Age', 'City', 'Occupation', 'Salary']
['Alice', '28', 'New York', 'Data Scientist', '95000']
['Bob', '34', 'San Francisco', 'Software Engineer', '110000']
['Charlie', '22', 'Chicago', 'Graphic Designer', '60000']
['Diana', '45', 'Los Angeles', 'Project Manager', '120000']
['Eve', '30', 'Seattle', 'Product Manager', '105000']


## 2. Importing and Exporting Data



### 2.1. Importing Data from CSV Files

- **Using `pandas.read_csv()`**:
  - Reads a CSV file into a DataFrame.
  - Common parameters: `filepath_or_buffer`, `sep`, `header`, `index_col`.
- **Example**: Reading a CSV file.

In [36]:
import pandas as pd

# Reading a CSV file
df = pd.read_csv('data.csv')

# Displaying the first few rows
print(df.head())

# Specifying a different separator and header
df = pd.read_csv('data.csv', sep=';', header=0)
print(df.head())

      Name  Age           City         Occupation  Salary
0    Alice   28       New York     Data Scientist   95000
1      Bob   34  San Francisco  Software Engineer  110000
2  Charlie   22        Chicago   Graphic Designer   60000
3    Diana   45    Los Angeles    Project Manager  120000
4      Eve   30        Seattle    Product Manager  105000
                 Name,Age,City,Occupation,Salary
0         Alice,28,New York,Data Scientist,95000
1  Bob,34,San Francisco,Software Engineer,110000
2      Charlie,22,Chicago,Graphic Designer,60000
3    Diana,45,Los Angeles,Project Manager,120000
4          Eve,30,Seattle,Product Manager,105000


### 2.2. Exporting Data to CSV Files

- **Using `DataFrame.to_csv()`**:
  - Writes a DataFrame to a CSV file.
  - Common parameters: `path_or_buf`, `index`, `sep`.
- **Example**: Exporting a DataFrame to a CSV file.

In [37]:
# Exporting to a CSV file
df.to_csv('output_data.csv', index=False)

# Exporting with a different separator
df.to_csv('output_data.csv', index=False, sep=';')

### 2.3. Importing Data from Excel Files

- **Using `pandas.read_excel()`**:
  - Reads an Excel file into a DataFrame.
  - Common parameters: `io`, `sheet_name`, `header`, `index_col`.
- **Example**: Reading an Excel file.

In [40]:
import openpyxl

# Reading an Excel file
df = pd.read_excel('data.xlsx', sheet_name='data')

# Displaying the first few rows
print(df.head())

      Name  Age           City         Occupation  Salary
0    Alice   28       New York     Data Scientist   95000
1      Bob   34  San Francisco  Software Engineer  110000
2  Charlie   22        Chicago   Graphic Designer   60000
3    Diana   45    Los Angeles    Project Manager  120000
4      Eve   30        Seattle    Product Manager  105000


In [41]:
# Reading a specific sheet by name or index
df = pd.read_excel('data.xlsx', sheet_name=0)  # First sheet
print(df.head())

      Name  Age           City         Occupation  Salary
0    Alice   28       New York     Data Scientist   95000
1      Bob   34  San Francisco  Software Engineer  110000
2  Charlie   22        Chicago   Graphic Designer   60000
3    Diana   45    Los Angeles    Project Manager  120000
4      Eve   30        Seattle    Product Manager  105000


### 2.4. Exporting Data to Excel Files

- **Using `DataFrame.to_excel()`**:
  - Writes a DataFrame to an Excel file.
  - Common parameters: `excel_writer`, `sheet_name`, `index`.
- **Example**: Exporting a DataFrame to an Excel file.

In [42]:
# Exporting to an Excel file
df.to_excel('data.xlsx', sheet_name='data', index=False)

# Exporting multiple DataFrames to different sheets
with pd.ExcelWriter('output_data.xlsx') as writer:
    df.to_excel(writer, sheet_name='Sheet1', index=False)

### 2.5. Importing Data from JSON Files

- **Using `pandas.read_json()`**:
  - Reads a JSON file into a DataFrame.
  - Common parameters: `path_or_buf`, `orient`, `lines`.
- **Example**: Reading a JSON file.

In [43]:
# Reading a JSON file
df = pd.read_json('data.json')

# Displaying the first few rows
print(df.head())

      Name  Age           City         Occupation  Salary
0    Alice   28       New York     Data Scientist   95000
1      Bob   34  San Francisco  Software Engineer  110000
2  Charlie   22        Chicago   Graphic Designer   60000
3    Diana   45    Los Angeles    Project Manager  120000
4      Eve   30        Seattle    Product Manager  105000


In [44]:
# Reading JSON lines format
df = pd.read_json('data2.jsonl', lines=True)
print(df.head())

      Name  Age           City         Occupation  Salary
0    Alice   28       New York     Data Scientist   95000
1      Bob   34  San Francisco  Software Engineer  110000
2  Charlie   22        Chicago   Graphic Designer   60000
3    Diana   45    Los Angeles    Project Manager  120000
4      Eve   30        Seattle    Product Manager  105000


### 2.6. Exporting Data to JSON Files

- **Using `DataFrame.to_json()`**:
  - Writes a DataFrame to a JSON file.
  - Common parameters: `path_or_buf`, `orient`, `lines`.
- **Example**: Exporting a DataFrame to a JSON file.

In [45]:
# Exporting to a JSON file
df.to_json('output_data.json', orient='records')

# Exporting in JSON lines format
df.to_json('output_data.jsonl', orient='records', lines=True)

### 2.7. Handling Missing Data

- **Identifying Missing Data**:
  - Use `isna()` or `isnull()` to detect missing values.
- **Handling Missing Data**:
  - Use `fillna()` to fill missing values.
  - Use `dropna()` to drop rows or columns with missing values.
- **Example**: Handling missing data in a DataFrame.

In [46]:
import pandas as pd

# Reading data from CSV file
df = pd.read_csv('test_data.csv')

print("Original data:")
print(df)
print("\n")

# Identifying missing data
print("Identifying missing data:")
print(df.isna())
print("\n")

# Filling missing data with zeros
df_filled = df.fillna(0)
print("Data after filling with zeros:")
print(df_filled)
print("\n")

# Dropping rows with missing data
df_dropped = df.dropna()
print("Data after dropping rows with missing values:")
print(df_dropped)

Original data:
             name   age           city   salary  experience
0        John Doe  30.0       New York  75000.0         5.0
1      Jane Smith   NaN  San Francisco  80000.0         3.0
2     Bob Johnson  45.0            NaN  90000.0         8.0
3     Alice Brown  28.0        Chicago      NaN         NaN
4  Charlie Wilson  35.0    Los Angeles  85000.0         6.0


Identifying missing data:
    name    age   city  salary  experience
0  False  False  False   False       False
1  False   True  False   False       False
2  False  False   True   False       False
3  False  False  False    True        True
4  False  False  False   False       False


Data after filling with zeros:
             name   age           city   salary  experience
0        John Doe  30.0       New York  75000.0         5.0
1      Jane Smith   0.0  San Francisco  80000.0         3.0
2     Bob Johnson  45.0              0  90000.0         8.0
3     Alice Brown  28.0        Chicago      0.0         0.0
4  Cha

### 2.8. Practical Example: Importing, Processing, and Exporting

- **Scenario**: Import financial data from a CSV file, process it, and export the results to an Excel file.
- **Example**: Calculating the average price and exporting the results.

In [77]:
import pandas as pd

# Importing financial data
df = pd.read_csv('data1.csv')

# Displaying the first few rows
print("Original data:")
print(df.head())
print("\n")

# Calculating the average price
df['Average_Price'] = df[['Open', 'Close']].mean(axis=1)

# Displaying the updated DataFrame
print("Data with average price:")
print(df.head())
print("\n")

# Exporting the results to an Excel file
df.to_excel('processed_financial_data.xlsx', index=False)
print("Data has been exported to 'processed_financial_data.xlsx'")

Original data:
         Date    Open   Close   Volume
0  2024-01-01  150.25  152.30  1000000
1  2024-01-02  152.30  153.45  1200000
2  2024-01-03  153.45  151.80   950000
3  2024-01-04  151.80  154.20  1100000
4  2024-01-05  154.20  155.50  1300000


Data with average price:
         Date    Open   Close   Volume  Average_Price
0  2024-01-01  150.25  152.30  1000000        151.275
1  2024-01-02  152.30  153.45  1200000        152.875
2  2024-01-03  153.45  151.80   950000        152.625
3  2024-01-04  151.80  154.20  1100000        153.000
4  2024-01-05  154.20  155.50  1300000        154.850


Data has been exported to 'processed_financial_data.xlsx'


## 3. Introduction to NumPy

- Objective: Understand the basics of NumPy, including arrays, operations, and functions.
- Key Concepts: Arrays, array attributes, element-wise operations, broadcasting.

### 3.1. Installing and Importing NumPy

- **Installation**:
  - Install NumPy using pip: `pip install numpy`.
- **Importing**:
  - Import NumPy in your script: `import numpy as np`.

In [48]:
# Installing NumPy
# Run this command in your terminal or command prompt
# pip install numpy

# Importing NumPy
import numpy as np

### 3.2. Creating NumPy Arrays

- **Creating Arrays**:
  - From a list: `np.array()`.
  - Filled with zeros: `np.zeros()`.
  - Filled with ones: `np.ones()`.
  - With a range of values: `np.arange()`, `np.linspace()`.
  - Identity matrix: `np.eye()`.
  - Random values: `np.random.rand()`, `np.random.randn()`.
- **Example**: Creating different types of arrays.

In [49]:
# Creating an array from a list
arr = np.array([1, 2, 3, 4, 5])
print(arr)

# Creating an array filled with zeros
zeros_arr = np.zeros((3, 3))
print(zeros_arr)

# Creating an array filled with ones
ones_arr = np.ones((2, 4))
print(ones_arr)

# Creating an array with a range of values
range_arr = np.arange(10)
print(range_arr)

# Creating an array with linearly spaced values
linspace_arr = np.linspace(0, 1, 5)
print(linspace_arr)

# Creating an identity matrix
identity_matrix = np.eye(3)
print(identity_matrix)

# Creating an array with random values
random_arr = np.random.rand(2, 3)
print(random_arr)

# Creating an array with random values from a standard normal distribution
random_normal_arr = np.random.randn(2, 3)
print(random_normal_arr)

[1 2 3 4 5]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[0 1 2 3 4 5 6 7 8 9]
[0.   0.25 0.5  0.75 1.  ]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[0.16749055 0.97234774 0.76606464]
 [0.10736104 0.91032889 0.89260423]]
[[-0.49947474  1.4399717   1.11369459]
 [ 1.64548204 -0.04619219  0.1411855 ]]


### 3.3. Array Attributes

- **Shape**: `arr.shape`.
- **Size**: `arr.size`.
- **Data Type**: `arr.dtype`.
- **Example**: Accessing array attributes.

In [50]:
# Creating an array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Accessing shape
print(f"Shape: {arr.shape}")

# Accessing size
print(f"Size: {arr.size}")

# Accessing data type
print(f"Data Type: {arr.dtype}")

Shape: (2, 3)
Size: 6
Data Type: int64


### 3.4. Array Operations

- **Element-wise Operations**:
  - Arithmetic operations: `+`, `-`, `*`, `/`, `**`.
  - Comparison operations: `==`, `!=`, `>`, `<`, `>=`, `<=`.
- **Broadcasting**:
  - Operations between arrays of different shapes.
- **Example**: Performing element-wise operations and broadcasting.

In [51]:
# Element-wise operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Addition
print(arr1 + arr2)

# Subtraction
print(arr1 - arr2)

# Multiplication
print(arr1 * arr2)

# Division
print(arr1 / arr2)

# Exponentiation
print(arr1 ** 2)

# Comparison operations
print(arr1 == arr2)
print(arr1 > arr2)

# Broadcasting
arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 2

# Adding a scalar to an array
print(arr + scalar)

# Multiplying an array by a scalar
print(arr * scalar)

[5 7 9]
[-3 -3 -3]
[ 4 10 18]
[0.25 0.4  0.5 ]
[1 4 9]
[False False False]
[False False False]
[[3 4 5]
 [6 7 8]]
[[ 2  4  6]
 [ 8 10 12]]


### 3.5. NumPy Functions

- **Mathematical Functions**:
  - `np.sum()`, `np.mean()`, `np.std()`, `np.min()`, `np.max()`.
- **Array Manipulation**:
  - `np.reshape()`, `np.transpose()`, `np.concatenate()`.
- **Example**: Using NumPy functions.

In [52]:
# Creating an array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Sum of all elements
print(f"Sum: {np.sum(arr)}")

# Mean of all elements
print(f"Mean: {np.mean(arr)}")

# Standard deviation of all elements
print(f"Standard Deviation: {np.std(arr)}")

# Minimum value
print(f"Minimum: {np.min(arr)}")

# Maximum value
print(f"Maximum: {np.max(arr)}")

# Reshaping an array
reshaped_arr = np.reshape(arr, (3, 2))
print(f"Reshaped Array:\n{reshaped_arr}")

# Transposing an array
transposed_arr = np.transpose(arr)
print(f"Transposed Array:\n{transposed_arr}")

# Concatenating arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
concatenated_arr = np.concatenate((arr1, arr2), axis=0)
print(f"Concatenated Array:\n{concatenated_arr}")

Sum: 21
Mean: 3.5
Standard Deviation: 1.707825127659933
Minimum: 1
Maximum: 6
Reshaped Array:
[[1 2]
 [3 4]
 [5 6]]
Transposed Array:
[[1 4]
 [2 5]
 [3 6]]
Concatenated Array:
[[1 2]
 [3 4]
 [5 6]]


### 3.6. Practical Example: Analyzing Data with NumPy

- **Scenario**: Calculate basic statistics for a set of financial data.
- **Example**: Calculating mean, standard deviation, and other statistics for stock prices.

In [53]:
import numpy as np

# Sample financial data (stock prices)
stock_prices = np.array([100, 102, 101, 105, 107, 110, 108, 109, 111, 112])

# Calculating mean price
mean_price = np.mean(stock_prices)
print(f"Mean Price: {mean_price}")

# Calculating standard deviation
std_dev = np.std(stock_prices)
print(f"Standard Deviation: {std_dev}")

# Calculating minimum and maximum prices
min_price = np.min(stock_prices)
max_price = np.max(stock_prices)
print(f"Minimum Price: {min_price}, Maximum Price: {max_price}")

# Calculating daily returns
daily_returns = np.diff(stock_prices) / stock_prices[:-1] * 100
print(f"Daily Returns (%): {daily_returns}")

Mean Price: 106.5
Standard Deviation: 4.080441152620633
Minimum Price: 100, Maximum Price: 112
Daily Returns (%): [ 2.         -0.98039216  3.96039604  1.9047619   2.80373832 -1.81818182
  0.92592593  1.83486239  0.9009009 ]


## 4. Introduction to Pandas



- **Objective**: Understand the basics of Pandas, including DataFrames, Series, and data manipulation.
- **Key Concepts**: DataFrames, Series, indexing, data manipulation.

### 4.1. Installing and Importing Pandas

- **Installation**:
  - Install Pandas using pip: `pip install pandas`.
- **Importing**:
  - Import Pandas in your script: `import pandas as pd`.

In [54]:
# Installing Pandas
# Run this command in your terminal or command prompt
# pip install pandas

# Importing Pandas
import pandas as pd

### 4.2. Creating DataFrames and Series

- **Creating a Series**:
  - A one-dimensional array-like object.
  - Example: `pd.Series([1, 2, 3, 4])`.
- **Creating a DataFrame**:
  - A two-dimensional table-like structure.
  - Example: `pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})`.
- **Example**: Creating Series and DataFrames.

In [55]:
# Creating a Series
s = pd.Series([1, 2, 3, 4])
print(s)

# Creating a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})
print(df)

0    1
1    2
2    3
3    4
dtype: int64
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


### 4.3. Accessing Data in DataFrames

- **Accessing Columns**:
  - Use column names: `df['column_name']`.
- **Accessing Rows**:
  - Use `loc[]` for label-based indexing.
  - Use `iloc[]` for position-based indexing.
- **Example**: Accessing columns and rows.

In [56]:
# Accessing a column
ages = df['Age']
print(ages)

# Accessing a row by label
row = df.loc[1]
print(row)

# Accessing a row by position
row = df.iloc[1]
print(row)

# Accessing specific cells
cell = df.loc[1, 'Age']
print(cell)

0    25
1    30
2    35
Name: Age, dtype: int64
Name            Bob
Age              30
City    Los Angeles
Name: 1, dtype: object
Name            Bob
Age              30
City    Los Angeles
Name: 1, dtype: object
30


### 4.4. Data Manipulation

- **Adding and Removing Columns**:
  - Add a column: `df['new_column'] = values`.
  - Remove a column: `df.drop('column_name', axis=1)`.
- **Filtering Data**:
  - Use boolean indexing: `df[df['Age'] > 30]`.
- **Sorting Data**:
  - Use `sort_values()`: `df.sort_values('Age')`.
- **Example**: Adding, removing, filtering, and sorting data.

In [57]:
# Adding a new column
df['Salary'] = [70000, 80000, 90000]
print(df)

# Removing a column
df = df.drop('City', axis=1)
print(df)

# Filtering data
filtered_df = df[df['Age'] > 30]
print(filtered_df)

# Sorting data
sorted_df = df.sort_values('Age')
print(sorted_df)

      Name  Age         City  Salary
0    Alice   25     New York   70000
1      Bob   30  Los Angeles   80000
2  Charlie   35      Chicago   90000
      Name  Age  Salary
0    Alice   25   70000
1      Bob   30   80000
2  Charlie   35   90000
      Name  Age  Salary
2  Charlie   35   90000
      Name  Age  Salary
0    Alice   25   70000
1      Bob   30   80000
2  Charlie   35   90000


### 4.5. Handling Missing Data

- **Identifying Missing Data**:
  - Use `isna()` or `isnull()`.
- **Handling Missing Data**:
  - Use `fillna()` to fill missing values.
  - Use `dropna()` to drop rows or columns with missing values.
- **Example**: Handling missing data.

In [58]:
# Creating a DataFrame with missing values
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, None, 35],
    'City': ['New York', None, 'Chicago']
})
print(df)

# Identifying missing data
print(df.isna())

# Filling missing data
df_filled = df.fillna({'Age': 30, 'City': 'Unknown'})
print(df_filled)

# Dropping missing data
df_dropped = df.dropna()
print(df_dropped)

      Name   Age      City
0    Alice  25.0  New York
1      Bob   NaN      None
2  Charlie  35.0   Chicago
    Name    Age   City
0  False  False  False
1  False   True   True
2  False  False  False
      Name   Age      City
0    Alice  25.0  New York
1      Bob  30.0   Unknown
2  Charlie  35.0   Chicago
      Name   Age      City
0    Alice  25.0  New York
2  Charlie  35.0   Chicago


### 4.6. Grouping and Aggregating Data

- **Grouping Data**:
  - Use `groupby()`: `df.groupby('column_name')`.
- **Aggregating Data**:
  - Use aggregation functions: `mean()`, `sum()`, `count()`, etc.
- **Example**: Grouping and aggregating data.

In [59]:
# Creating a DataFrame
df = pd.DataFrame({
    'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Salary': [70000, 80000, 90000, 85000, 95000]
})
print(df)

# Grouping by department
grouped = df.groupby('Department')

# Calculating average salary by department
avg_salary = grouped['Salary'].mean()
print(avg_salary)

# Counting employees by department
employee_count = grouped['Employee'].count()
print(employee_count)

  Department Employee  Salary
0         HR    Alice   70000
1         IT      Bob   80000
2         HR  Charlie   90000
3         IT    David   85000
4    Finance      Eve   95000
Department
Finance    95000.0
HR         80000.0
IT         82500.0
Name: Salary, dtype: float64
Department
Finance    1
HR         2
IT         2
Name: Employee, dtype: int64


### 4.7. Practical Example: Analyzing Data with Pandas

- **Scenario**: Analyze a dataset of stock prices.
- **Example**: Calculating daily returns and moving averages.

In [60]:
import pandas as pd

# Sample financial data (stock prices)
data = {
    'Date': pd.date_range(start='2023-01-01', periods=10),
    'Price': [100, 102, 101, 105, 107, 110, 108, 109, 111, 112]
}
df = pd.DataFrame(data)

# Calculating daily returns
df['Daily_Return'] = df['Price'].pct_change() * 100
print(df)

# Calculating moving average
df['Moving_Average'] = df['Price'].rolling(window=3).mean()
print(df)

        Date  Price  Daily_Return
0 2023-01-01    100           NaN
1 2023-01-02    102      2.000000
2 2023-01-03    101     -0.980392
3 2023-01-04    105      3.960396
4 2023-01-05    107      1.904762
5 2023-01-06    110      2.803738
6 2023-01-07    108     -1.818182
7 2023-01-08    109      0.925926
8 2023-01-09    111      1.834862
9 2023-01-10    112      0.900901
        Date  Price  Daily_Return  Moving_Average
0 2023-01-01    100           NaN             NaN
1 2023-01-02    102      2.000000             NaN
2 2023-01-03    101     -0.980392      101.000000
3 2023-01-04    105      3.960396      102.666667
4 2023-01-05    107      1.904762      104.333333
5 2023-01-06    110      2.803738      107.333333
6 2023-01-07    108     -1.818182      108.333333
7 2023-01-08    109      0.925926      109.000000
8 2023-01-09    111      1.834862      109.333333
9 2023-01-10    112      0.900901      110.666667
