In [2]:
import pandas as pd
import numpy as np

In [3]:
df=pd.read_csv("/content/NCCL Historical Data.csv")

In [4]:
df.isna().sum()

Unnamed: 0,0
Date,0
Price,0
Open,0
High,0
Low,0
Vol.,1
Change %,0


In [5]:
df=df[['Price','Open','Low','High']]

In [6]:
df

Unnamed: 0,Price,Open,Low,High
0,229.02,229.00,228.11,230.01
1,229.30,226.58,221.89,229.74
2,226.58,226.00,223.81,229.10
3,231.55,238.00,230.77,238.90
4,238.14,239.00,235.87,240.55
...,...,...,...,...
4310,212.84,220.28,209.93,220.34
4311,216.21,220.34,214.59,224.59
4312,224.23,219.67,219.67,227.62
4313,217.34,223.40,215.44,227.84


# Task
Predict the 'low' price based on 'price', 'open', and 'high' from the dataset "stock_price.csv".

## Prepare the data

### Subtask:
Define the features (X) and target (y) variables.


**Reasoning**:
Define the features (X) and target (y) variables as instructed.



In [7]:
X = df[['Price', 'Open', 'High']]
y = df['Low']

## Split the data

### Subtask:
Split the data into training and testing sets.


**Reasoning**:
Split the data into training and testing sets using train_test_split.



In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Train the model

### Subtask:
Train a regression model on the training data.


**Reasoning**:
Import and train a Linear Regression model on the training data.



In [9]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

## Evaluate the model

### Subtask:
Evaluate the model's performance on the testing data.


**Reasoning**:
Evaluate the trained model's performance on the testing data by calculating the Mean Squared Error and R-squared score.



In [10]:
from sklearn.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared Score (R2): {r2}")

Mean Squared Error (MSE): 2.3910551148074117
R-squared Score (R2): 0.9994565611859261


## Make predictions

### Subtask:
Use the trained model to make predictions on new data.


**Reasoning**:
Use the trained model to make predictions on the testing features and store them in a variable.



In [11]:
predictions = model.predict(X_test)
print(predictions)

[ 96.50462989  95.49408205  84.97071381  95.36155183  60.35689009
 116.49360537  82.24307094  84.40537417  76.3577572   59.99272033
  21.76890094  49.41686606  83.99010571 112.92212713  73.78348007
  43.00929442  90.91460192  10.67081306 123.26443075  95.73767532
 149.785876   103.23042722  37.60255157  83.51535589  78.64581418
 252.81628025  34.7751762  228.30781392  74.54466776  45.67542304
 236.11561781  76.47684621  25.16801724  71.14685232  86.37023062
  99.50885407  78.17154877  70.62580016  10.87859916  72.09287163
  16.39490887 173.50992809  69.21133615  39.11868593 302.64943642
  80.31103278  89.02488689 109.38639676  70.92690941  81.6845912
  73.58384979 121.10894495  90.48665284 274.95931004  78.91295227
 261.97139989 173.28928286  76.39085214  93.30954647 122.51755283
  59.7832739   22.48648982 275.76922347 230.97700786  65.00580241
 121.25613983  56.59202784 139.04738599 304.71759496  96.39361778
 102.48484661  85.61789381  48.08352881  84.09548024  82.33295113
  96.700736

## Summary:

### Data Analysis Key Findings

*   The data was successfully split into training and testing sets with a test size of 0.2.
*   A Linear Regression model was trained on the training data.
*   The Mean Squared Error (MSE) of the model on the testing data is approximately 2.39.
*   The R-squared score (R2) of the model on the testing data is approximately 0.999.
*   The trained model was used to make predictions on the testing data.

### Insights or Next Steps

*   The high R-squared score (0.999) suggests that the model fits the testing data very well, indicating a strong relationship between 'Price', 'Open', 'High', and 'Low' in this dataset.
*   Given the excellent performance, the model can be deployed to predict 'low' prices for new stock data, but it would be beneficial to test its performance on out-of-sample data to ensure generalization.


# Task
Predict the low price based on the price open and high from the file "stock_price.csv". Tune the Linear Regression model using hyperparameter tuning to reduce the prediction error.

## Define the model and hyperparameters

### Subtask:
Define the Linear Regression model and the hyperparameters to tune.


**Reasoning**:
Define the Linear Regression model and the hyperparameters to tune as instructed.



In [12]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()

param_grid = {'fit_intercept': [True, False]}

## Define the search space

### Subtask:
Define the search space for the hyperparameters.


## Perform hyperparameter tuning

### Subtask:
Use a hyperparameter tuning technique (e.g., GridSearchCV, RandomizedSearchCV) to find the best hyperparameters.


**Reasoning**:
Import GridSearchCV, instantiate it with the model and parameter grid, fit it to the training data, and then print the best parameters found.



In [13]:
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

print(f"Best hyperparameters: {grid_search.best_params_}")

Best hyperparameters: {'fit_intercept': False}


## Train the model with the best hyperparameters

### Subtask:
Train the Linear Regression model using the best hyperparameters found during tuning.


**Reasoning**:
Retrieve the best performing model from the grid_search object.



In [14]:
# The best_model is already available from the previous grid search step.
# No explicit action is needed in this step other than confirming its existence.
print("The best model has been retrieved from grid_search and stored in 'best_model'.")

The best model has been retrieved from grid_search and stored in 'best_model'.


## Evaluate the tuned model

### Subtask:
Evaluate the performance of the tuned model on the testing data.


**Reasoning**:
Evaluate the performance of the tuned model on the testing data by calculating and printing the Mean Squared Error (MSE) and R-squared (R2) score.



In [15]:
from sklearn.metrics import mean_squared_error, r2_score

y_pred_tuned = best_model.predict(X_test)

mse_tuned = mean_squared_error(y_test, y_pred_tuned)
r2_tuned = r2_score(y_test, y_pred_tuned)

print(f"Tuned Model Mean Squared Error (MSE): {mse_tuned}")
print(f"Tuned Model R-squared Score (R2): {r2_tuned}")

Tuned Model Mean Squared Error (MSE): 2.3911185722218717
Tuned Model R-squared Score (R2): 0.9994565467633301


## Compare the results

### Subtask:
Compare the performance of the tuned model with the initial model.


**Reasoning**:
Print the MSE and R2 scores for both the initial and tuned models to compare their performance.



In [16]:
print("Initial Model Performance:")
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared Score (R2): {r2}")
print("\nTuned Model Performance:")
print(f"Mean Squared Error (MSE): {mse_tuned}")
print(f"R-squared Score (R2): {r2_tuned}")

Initial Model Performance:
Mean Squared Error (MSE): 2.3910551148074117
R-squared Score (R2): 0.9994565611859261

Tuned Model Performance:
Mean Squared Error (MSE): 2.3911185722218717
R-squared Score (R2): 0.9994565467633301


## Summary:

### Data Analysis Key Findings

*   The best hyperparameter found for the Linear Regression model using GridSearchCV was `{'fit_intercept': False}`.
*   The tuned Linear Regression model achieved a Mean Squared Error (MSE) of approximately 2.391 and an R-squared (R2) score of approximately 0.999 on the testing data.
*   Comparing the tuned model to the initial model, the MSE and R2 scores were very similar (Initial Model MSE: $\sim$2.391, R2: $\sim$0.999; Tuned Model MSE: $\sim$2.391, R2: $\sim$0.999), indicating that tuning the `fit_intercept` parameter did not significantly improve the prediction performance in this case.

### Insights or Next Steps

*   The high R-squared score suggests that the 'price open' and 'price high' are strong predictors of the 'price low'.
*   Consider exploring other hyperparameters or different regression models to potentially achieve further reductions in prediction error, although the current performance is already very good.


In [21]:
def predict_low_price(price, open_price, high_price):
  """
  Predicts the low price based on user input using the tuned Linear Regression model.

  Args:
    price: The current price.
    open_price: The opening price.
    high_price: The high price.

  Returns:
    The predicted low price.
  """
  # Create a DataFrame from the input values
  input_data = pd.DataFrame([[price, open_price, high_price]], columns=['Price', 'Open', 'High'])

  # Make the prediction using the best model
  predicted_low = best_model.predict(input_data)

  return predicted_low[0]

# Example usage:
# Get user input for Price, Open, and High
price_input = float(input('Enter Price: '))
open_input = float(input('Enter Open: '))
high_input = float(input('Enter High: '))


# Predict the low price using user input
predicted_low_example = predict_low_price(price_input, open_input, high_input)
print(f"Predicted Low Price: {predicted_low_example}")

Enter Price: 229.02
Enter Open: 229.00
Enter High: 230.01
Predicted Low Price: 226.76841263206458
