Software Development Tools: Project

This project aims to provide you with additional practice on common software engineering tasks. These tasks will augment and complement your data skills, and make you a more attractive job candidate to potential employers.

In [4]:
import pandas as pd
import streamlit as st
import plotly.express as px
import matplotlib.pyplot as plt

In [5]:
# Load the dataset
df = pd.read_csv("vehicles_us.csv")

# Ensure 'model' column exists before splitting
if 'model' in df.columns:
    df[['make', 'model']] = df['model'].str.split(n=1, expand=True)

# Display the first 10 rows of the DataFrame
st.write(df.head(10))

# Remove rows with missing values in 'price' and 'model_year'
df = df.dropna(subset=['price', 'model_year'])

# Convert 'model_year' to integer (if it's not already)
df['model_year'] = df['model_year'].astype(int)

# Identify outliers based on price and model year
price_outlier_threshold = df['price'].quantile(0.95)
year_outlier_threshold = df['model_year'].quantile(0.95)

# Filter outliers
price_outliers = df[df['price'] > price_outlier_threshold]
year_outliers = df[df['model_year'] > year_outlier_threshold]

# Function to create scatter plots
def create_scatter_plot(data, x_col, y_col, title):
    if not data.empty:
        fig = px.scatter(data, x=x_col, y=y_col, title=title)
        st.plotly_chart(fig)
    else:
        st.write(f"No data available for {title}")

# Scatter plot for model year outliers
create_scatter_plot(year_outliers, 'model_year', 'price', 'Model Year Outliers vs Price')

# Scatter plot for price outliers
create_scatter_plot(price_outliers, 'model_year', 'price', 'Price Outliers vs Model Year')

2025-03-20 23:13:02.264 
  command:

    streamlit run C:\Users\kenny\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\ipykernel_launcher.py [ARGUMENTS]


Variable Naming: Changed vehicle to vehicles in the print statement to match the DataFrame name.
Outlier Detection: Added logic to detect outliers based on the 95th percentile for both price and model year.
Scatter Plots: Created a function to generate scatter plots using Plotly for better integration with Streamlit.