# Team Project Assignment - Milestone 1

**Dataset**: [E-Scooter Trips – City of Austin (2018–2022)](https://data.austintexas.gov/Transportation-and-Mobility/Shared-Micromobility-Vehicle-Trips-2018-2022-/7d8e-dm7r/about_data)

**Prediction Task**: We aim to predict the **duration of an e-scooter trip** (in minutes) based on features such as start time, day of week, start location, and weather conditions. This is a **regression problem**.

**Planned Features**:
- Hour of day
- Day of week
- Month
- Start location (latitude/longitude or region/neighborhood)
- Trip distance (calculated from coordinates)
- Weather data (temperature, precipitation, wind) [to be added]

In [None]:
# Import basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# For displaying all columns
pd.set_option('display.max_columns', None)

In [None]:
# Load dataset (assuming it's been downloaded locally or use pd.read_csv with a direct link if CSV is hosted)
data = pd.read_csv("Shared_Micromobility_Vehicle_Trips.csv")

# Quick look at the dataset
data.head()

In [None]:
# Basic info
data.info()

# Check for null values
data.isnull().sum()

# Basic statistics
data.describe()


In [None]:
# Convert start and end times to datetime
data['start_time'] = pd.to_datetime(data['start_time'])
data['end_time'] = pd.to_datetime(data['end_time'])

# Create trip duration in minutes
data['trip_duration_min'] = (data['end_time'] - data['start_time']).dt.total_seconds() / 60

# Drop trips with non-positive durations
data = data[data['trip_duration_min'] > 0]

# Histogram of trip durations
plt.hist(data['trip_duration_min'], bins=50)
plt.xlabel("Trip Duration (min)")
plt.title("Distribution of Trip Durations")
plt.show()
