## 📘 Introduction  
In this notebook, we’ll clean and prepare the **AAPL stock data** for modeling.  
We’ll handle missing values, check data consistency, and ensure the dataset is ready for analysis.

## ⚙️ Import Libraries and Load Data  
We’ll import the necessary libraries and reload the dataset from Yahoo Finance.

In [1]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")

# Load data
ticker = "AAPL"
data = yf.download(ticker, start="2018-01-01", end="2025-01-01")
data.head()

[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2018-01-02,40.380993,40.390372,39.677736,39.888715,102223600
2018-01-03,40.373962,40.917816,40.310672,40.444289,118071600
2018-01-04,40.561504,40.664649,40.338807,40.446638,89738400
2018-01-05,41.023315,41.110049,40.566199,40.657622,94640000
2018-01-08,40.870941,41.166308,40.772482,40.870941,82271200


## 🔍 Check for Missing Values  
We’ll inspect the dataset for any missing or null entries.

In [2]:
missing_values = data.isnull().sum()
print("Missing values per column:\n", missing_values)

Missing values per column:
 Price   Ticker
Close   AAPL      0
High    AAPL      0
Low     AAPL      0
Open    AAPL      0
Volume  AAPL      0
dtype: int64


## 🧹 Handle Missing Values  
If missing values exist, we can fill them using forward-fill or backward-fill methods.

In [3]:
data = data.ffill().bfill()
print("✅ Missing values handled successfully!")

✅ Missing values handled successfully!


## 📅 Check Index and Data Types  
Ensure the index is datetime and columns have correct data types.

In [4]:
data.index = pd.to_datetime(data.index)
print(data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1761 entries, 2018-01-02 to 2024-12-31
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   (Close, AAPL)   1761 non-null   float64
 1   (High, AAPL)    1761 non-null   float64
 2   (Low, AAPL)     1761 non-null   float64
 3   (Open, AAPL)    1761 non-null   float64
 4   (Volume, AAPL)  1761 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 82.5 KB
None


## 🧾 Summary  
- All missing data handled.  
- Index is properly set as datetime.  
- Data is clean and ready for modeling in the next step.