Software Development Tools: Project

This project aims to provide you with additional practice on common software engineering tasks. These tasks will augment and complement your data skills, and make you a more attractive job candidate to potential employers.

In [36]:
import pandas as pd
import streamlit as st
import plotly.express as px
import altair as alt
import matplotlib.pyplot as plt
import numpy as np

In [37]:
# Load the dataset
vehicles = pd.read_csv('vehicles_us.csv')

# Split 'model' into 'make' and 'model'
vehicles[['make', 'model']] = vehicles['model'].str.split(n=1, expand=True)

# Display the first 10 rows of the DataFrame
st.write(vehicles.head(10))

# Identify outliers based on price and model year
price_outlier_threshold = vehicles['price'].quantile(0.95)
year_outlier_threshold = vehicles['model_year'].quantile(0.95)

# Filter outliers
price_outliers = vehicles[vehicles['price'] > price_outlier_threshold]
year_outliers = vehicles[vehicles['model_year'] > year_outlier_threshold]

# Create scatter plots for model year and price outliers
def create_scatter_plot(data, x_col, y_col, title):
    fig = px.scatter(data, x=x_col, y=y_col, title=title)
    st.plotly_chart(fig)

# Scatter plot for model year outliers
create_scatter_plot(year_outliers, 'model_year', 'price', 'Scatter Plot of Model Year Outliers vs Price')

# Scatter plot for price outliers
create_scatter_plot(price_outliers, 'model_year', 'price', 'Scatter Plot of Price Outliers')

Variable Naming: Changed vehicle to vehicles in the print statement to match the DataFrame name.
Outlier Detection: Added logic to detect outliers based on the 95th percentile for both price and model year.
Scatter Plots: Created a function to generate scatter plots using Plotly for better integration with Streamlit.