# Day 16 - Filtering and Sorting Data


## Why is Filtering and Sorting Important?
When working with real-world datasets, not all the data may be relevant to your analysis. Filtering helps you focus on the specific subset of data that matters most to your analysis. Sorting, on the other hand, allows you to organize your data to identify trends, compare entries, or prepare it for further analysis.



## Tutorial: Querying DataFrames Based on Conditions
Pandas makes it easy to filter and sort data in a DataFrame using boolean indexing, the `query()` method, and the `sort_values()` function. Let's explore these techniques with practical examples.


### Filtering Data

In [None]:
!pip install pandas

In [None]:
import pandas as pd
# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 30, 22, 35, 28],
    'Job': ['Engineer', 'Doctor', 'Artist', 'Engineer', 'Doctor']
}
df = pd.DataFrame(data)

# Filtering for rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]
print("Filtered DataFrame (Age > 25):")
print(filtered_df)


### Using `query()` for Filtering

In [None]:

# Using the query() method
filtered_df = df.query('Age > 25 and Job == "Doctor"')
print("\nFiltered DataFrame (Age > 25 and Job is Doctor):")
print(filtered_df)


In [None]:
filtered_df = df.where(df['Age']>25).dropna(how="any")
print("\nFiltered DataFrame (Age > 25):")
print(filtered_df)

### Sorting Data

In [None]:

# Sorting by Age in ascending order
sorted_df = df.sort_values(by='Age')
print("\nDataFrame sorted by Age (ascending):")
print(sorted_df)

# Sorting by Job and then by Age in descending order
sorted_df = df.sort_values(by=['Job', 'Age'], ascending=[True, False])
print("\nDataFrame sorted by Job and then by Age (descending):")
print(sorted_df)



## Use Case: Filtering and Sorting ETF Data for Investment Decisions
In this use case, we'll focus on analyzing various ETFs (Exchange-Traded Funds) related to the S&P 500 to help make informed investment decisions. We'll use the Yahoo Finance API to retrieve data on multiple ETFs, calculate key metrics, and sort the ETFs based on relevant financial metrics to aid in our analysis.


### Step 1: Downloading Data for Multiple ETFs

In [None]:

import yfinance as yf
import pandas as pd

# Define a list of ETF tickers related to the S&P 500
etf_tickers = ['SPY', 'IVV', 'VOO', 'SPLG', 'SPYG', 'SPYD', 'SPYV', 'RSP', 'VXF', 'IJR']

# Download historical stock data for these ETFs
etf_data = {}
for ticker in etf_tickers:
    etf_data[ticker] = yf.download(ticker, start='2014-01-01', end='2024-08-19')


### Step 2: Calculating ROI Metrics

In [None]:

# Initialize a DataFrame to store ROI and other financial metrics
roi_df = pd.DataFrame(columns=[
    'Ticker', 'YTD_ROI', '1Y_ROI', '5Y_ROI', '10Y_ROI', 
    'P/E Ratio'
])

for ticker, data in etf_data.items():
    # Calculate ROI metrics
    ytd_roi = (data['Adj Close'][-1] / data['Adj Close'][0]) - 1
    one_year_roi = (data['Adj Close'][-1] / data['Adj Close'][-252]) - 1
    five_year_roi = (data['Adj Close'][-1] / data['Adj Close'][-1260]) - 1
    if len(data) >= 2520:
        ten_year_roi = (data['Adj Close'][-1] / data['Adj Close'][-2520]) - 1
    else:
        ten_year_roi = None
    
    # Fetch additional financial metrics from Yahoo Finance
    ticker_info = yf.Ticker(ticker).info
    pe_ratio = ticker_info.get('trailingPE', None)

    # Append the calculated metrics to the DataFrame
    roi_df = roi_df._append({
        'Ticker': ticker,
        'YTD_ROI': ytd_roi,
        '1Y_ROI': one_year_roi,
        '5Y_ROI': five_year_roi,
        '10Y_ROI': ten_year_roi,
        'P/E Ratio': pe_ratio
    }, ignore_index=True)


### Step 3: Filtering and Sorting the ETFs with Additional Metrics

In [None]:

# Filter out ETFs without a 10-year ROI
filtered_roi_df = roi_df.dropna(subset=['10Y_ROI'])

# Sort by YTD ROI and then by 5-year ROI
sorted_roi_df = filtered_roi_df.sort_values(by=['YTD_ROI', '5Y_ROI'], ascending=[False, False])

print("Top ETFs based on YTD ROI and 5-year ROI:")
print(sorted_roi_df)
