<a href="https://colab.research.google.com/github/khokhar002/Fun-with-data/blob/main/Stock_Market_Predictor_for_Nifty.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# This cell is currently empty. It can be used for any initial setup or imports if needed later.

## Stock Market Predictor for Nifty

This notebook aims to demonstrate the process of creating a simple stock market predictor for the Nifty index using historical data and a machine learning model.

**Disclaimer:** Stock market prediction is inherently risky, and this notebook is for educational purposes only. It should not be considered financial advice.

### 1. Data Acquisition

We will start by acquiring historical Nifty price data. For this example, we'll use a placeholder for data loading. In a real-world scenario, you would replace this with code to fetch data from a reliable source (e.g., a financial data API).

In [None]:
import pandas as pd

# Placeholder for data loading
# In a real application, you would fetch data from a source like a financial API
# For demonstration, let's create a sample dataframe with more data points
data = {
    'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
                            '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10',
                            '2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15',
                            '2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20',
                            '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24', '2023-01-25']),
    'Open': [18000, 18100, 18050, 18150, 18200, 18180, 18220, 18250, 18280, 18300,
             18290, 18310, 18330, 18350, 18340, 18360, 18380, 18400, 18390, 18410,
             18430, 18450, 18440, 18460, 18480],
    'High': [18100, 18150, 18100, 18200, 18250, 18230, 18270, 18300, 18330, 18350,
             18340, 18360, 18380, 18400, 18390, 18410, 18430, 18450, 18440, 18460,
             18480, 18500, 18490, 18510, 18530],
    'Low': [17950, 18050, 18000, 18100, 18150, 18130, 18170, 18200, 18230, 18250,
            18240, 18260, 18280, 18300, 18290, 18310, 18330, 18350, 18340, 18360,
            18380, 18400, 18390, 18410, 18430],
    'Close': [18050, 18100, 18050, 18150, 18200, 18180, 18220, 18250, 18280, 18300,
              18290, 18310, 18330, 18350, 18340, 18360, 18380, 18400, 18390, 18410,
              18430, 18450, 18440, 18460, 18480],
    'Volume': [100000, 120000, 110000, 130000, 140000, 135000, 145000, 150000, 155000, 160000,
               158000, 162000, 165000, 170000, 168000, 172000, 175000, 180000, 178000, 182000,
               185000, 190000, 188000, 192000, 195000]
}
nifty_df = pd.DataFrame(data)

# Set 'Date' as the index for time series analysis
nifty_df.set_index('Date', inplace=True)

# Display the first few rows of the dataframe to verify the data loading and indexing
display(nifty_df.head())

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-01-01,18000,18100,17950,18050,100000
2023-01-02,18100,18150,18050,18100,120000
2023-01-03,18050,18100,18000,18050,110000
2023-01-04,18150,18200,18100,18150,130000
2023-01-05,18200,18250,18150,18200,140000


### 2. Data Preprocessing

Now, we will preprocess the data. This might involve handling missing values, if any, and potentially scaling the data. For this sample data, we will just check for missing values.

In [None]:
# Check for missing values in the dataframe
print("Missing values before handling:")
print(nifty_df.isnull().sum())

# In a real scenario, you might handle missing values using methods like forward fill, backward fill, or interpolation.
# nifty_df.fillna(method='ffill', inplace=True)

Missing values before handling:
Open      0
High      0
Low       0
Close     0
Volume    0
dtype: int64


### 3. Feature Engineering

Let's create some features that could be useful for prediction. We will add simple moving averages as examples.

In [None]:
# Create a 5-day moving average of the 'Close' price
nifty_df['MA_5'] = nifty_df['Close'].rolling(window=5).mean()

# Create a 20-day moving average of the 'Close' price
nifty_df['MA_20'] = nifty_df['Close'].rolling(window=20).mean()

# Drop rows with NaN values resulting from moving average calculation
# These NaN values occur at the beginning of the dataframe where there isn't enough data for the moving average window
nifty_df.dropna(inplace=True)

# Display the first few rows of the dataframe after adding moving averages and dropping NaNs
display(nifty_df.head())

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA_5,MA_20
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-01-20,18410,18460,18360,18410,182000,18388.0,18267.0
2023-01-21,18430,18480,18380,18430,185000,18402.0,18286.0
2023-01-22,18450,18500,18400,18450,190000,18416.0,18303.5
2023-01-23,18440,18490,18390,18440,188000,18424.0,18323.0
2023-01-24,18460,18510,18410,18460,192000,18438.0,18338.5


### 4. Model Selection and Training

For this example, we'll use a simple Linear Regression model to predict the 'Close' price based on the engineered features. In a real-world application, you would explore more sophisticated time series models.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Define features (X) and target (y) for the model
# X includes the columns we will use to predict the 'Close' price
X = nifty_df[['Open', 'High', 'Low', 'Volume', 'MA_5', 'MA_20']]
# y is the target variable, which is the 'Close' price
y = nifty_df['Close']

# Split data into training and testing sets
# test_size=0.2 means 20% of the data will be used for testing
# shuffle=False is important for time series data to maintain the temporal order
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Initialize the Linear Regression model
model = LinearRegression()
# Train the model using the training data
model.fit(X_train, y_train)

### 5. Model Evaluation

Now, let's evaluate the trained model's performance on the test set. We will use Mean Squared Error (MSE) as the evaluation metric.

In [None]:
# Make predictions on the test set using the trained model
y_pred = model.predict(X_test)

# Evaluate the model's performance using Mean Squared Error (MSE)
# MSE measures the average of the squared differences between actual and predicted values
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.012888764830631791


### 6. Prediction

We can now use the trained model to make predictions. For a real-world scenario, you would use new, unseen data for prediction. Here, we will demonstrate by predicting on the test set and comparing with the actual values.

In [None]:
# Create a DataFrame to display actual vs. predicted prices
predictions_df = pd.DataFrame({'Actual Close': y_test, 'Predicted Close': y_pred})
# Display the DataFrame
display(predictions_df)

Unnamed: 0_level_0,Actual Close,Predicted Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-24,18460,18460.053691
2023-01-25,18480,18480.15131


### 7. Finish task

We have built a simple Linear Regression model to predict Nifty prices. While this notebook provides a basic framework, keep in mind that accurate stock market prediction is challenging and requires more advanced techniques, extensive data, and careful consideration of various factors.