# Task
Train a RandomForestRegressor model on the dataset located at "/content/tarding-dataset.csv" to predict the 'Close' price using 'Open', 'High', 'Low', 'Volume', and date-related features ('Day', 'Month', 'Weekday'). Save the trained model to a pickle file named "model.pkl" and identify the required input features for the model.

## Load the data

### Subtask:
Load the data from the specified CSV file into a pandas DataFrame.


**Reasoning**:
Load the data from the CSV file into a pandas DataFrame and display the first few rows.



In [None]:
import pandas as pd

df = pd.read_csv('/content/tarding-dataset.csv')
display(df.head())

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2023-01-01,100.0,105.0,95.0,100.0,1000
1,2023-01-02,102.5,108.5,93.9,101.8,1100
2,2023-01-03,105.0,112.0,92.8,103.6,1200
3,2023-01-04,107.5,106.5,91.7,105.4,1300
4,2023-01-05,110.0,110.0,96.6,107.2,1400


## Prepare the data

### Subtask:
Preprocess the data by converting the 'Date' column to datetime and extracting relevant features.


**Reasoning**:
Convert the 'Date' column to datetime objects and extract the day, month, and weekday into new columns.



In [None]:
df['Date'] = pd.to_datetime(df['Date'])
df['Day'] = df['Date'].dt.day
df['Month'] = df['Date'].dt.month
df['Weekday'] = df['Date'].dt.weekday
display(df.head())

Unnamed: 0,Date,Open,High,Low,Close,Volume,Day,Month,Weekday
0,2023-01-01,100.0,105.0,95.0,100.0,1000,1,1,6
1,2023-01-02,102.5,108.5,93.9,101.8,1100,2,1,0
2,2023-01-03,105.0,112.0,92.8,103.6,1200,3,1,1
3,2023-01-04,107.5,106.5,91.7,105.4,1300,4,1,2
4,2023-01-05,110.0,110.0,96.6,107.2,1400,5,1,3


## Split the data

### Subtask:
Split the data into training and testing sets.


**Reasoning**:
Define features (X) and target (y), then split the data into training and testing sets using train_test_split.



In [None]:
from sklearn.model_selection import train_test_split

X = df[['Open', 'High', 'Low', 'Volume', 'Day', 'Month', 'Weekday']]
y = df['Close']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)

Shape of X_train: (40, 7)
Shape of X_test: (11, 7)
Shape of y_train: (40,)
Shape of y_test: (11,)


## Train the model

### Subtask:
Train the RandomForestRegressor model using the training data.


**Reasoning**:
Train the RandomForestRegressor model using the training data.



In [None]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

## Save the model

### Subtask:
Save the trained model to a pickle file.


**Reasoning**:
Save the trained model to a pickle file.



In [None]:
import pickle

with open("model.pkl", "wb") as f:
    pickle.dump(model, f)

## Identify inputs

### Subtask:
Determine and list the input features required by the trained model.


**Reasoning**:
Access the feature names from the trained model object and store them in a variable. Print the list of feature names to display the required inputs.



In [None]:
input_features = list(X_train.columns)
print("Input features required by the model:")
print(input_features)

Input features required by the model:
['Open', 'High', 'Low', 'Volume', 'Day', 'Month', 'Weekday']


## Summary:

### Data Analysis Key Findings

*   The dataset was loaded from "/content/tarding-dataset.csv".
*   The 'Date' column was converted to datetime objects, and 'Day', 'Month', and 'Weekday' features were extracted.
*   The data was split into training (80%) and testing (20%) sets. The training set contained 40 samples with 7 features, and the testing set contained 11 samples with 7 features.
*   A RandomForestRegressor model was trained using 'Open', 'High', 'Low', 'Volume', 'Day', 'Month', and 'Weekday' as features and 'Close' as the target.
*   The trained model was saved to a file named "model.pkl".
*   The input features required by the trained model were identified as \['Open', 'High', 'Low', 'Volume', 'Day', 'Month', 'Weekday'\].

### Insights or Next Steps

*   Evaluate the trained model's performance on the test set using appropriate regression metrics (e.g., Mean Squared Error, R-squared) to understand its predictive accuracy.
*   Consider feature scaling for numerical features or explore feature engineering techniques to potentially improve model performance.


## Make a prediction

### Subtask:
Take user inputs for the features and predict the 'Close' price using the trained model.

**Reasoning**:
Load the saved model, prompt the user to enter values for each required input feature, create a DataFrame from the inputs, and use the model to make a prediction.

In [None]:
import pickle
import pandas as pd

# Load the trained model
with open("model.pkl", "rb") as f:
    loaded_model = pickle.load(f)

# Get input from the user for each feature
input_data = {}
for feature in input_features:
    value = float(input(f"Enter value for {feature}: "))
    input_data[feature] = [value]

# Create a DataFrame from the input data
input_df = pd.DataFrame(input_data)

# Make a prediction
predicted_close = loaded_model.predict(input_df)

print(f"\nPredicted 'Close' price: {predicted_close[0]}")

Enter value for Open: 299
Enter value for High: 20
Enter value for Low: 18
Enter value for Volume: 200
Enter value for Day: 5
Enter value for Month: 5
Enter value for Weekday: 56

Predicted 'Close' price: 117.22599999999998
