### Step 1: Import Libraries
We import essential libraries: `numpy` for data manipulation, `PolynomialFeatures` for generating polynomial terms, `LinearRegression` for our model, `train_test_split` for splitting data, and `r2_score` for evaluation.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

### Step 2: Generate Random Data
We create 1000 random values between 0 and 1 for `x1`, `x2`, and the constant term `c` using `np.random.rand()`. This forms the basis of our features and adds randomness to the data.


In [2]:
np.random.seed(0)
x1 = np.random.rand(1000)
x2 = np.random.rand(1000)
c = np.random.rand(1000)

df = pd.DataFrame({'x1': x1, 'x2': x2, 'c': c})

df.head(10)

Unnamed: 0,x1,x2,c
0,0.548814,0.59288,0.811518
1,0.715189,0.010064,0.476084
2,0.602763,0.475826,0.523156
3,0.544883,0.70877,0.250521
4,0.423655,0.043975,0.605043
5,0.645894,0.879521,0.302905
6,0.437587,0.520081,0.577284
7,0.891773,0.030661,0.169678
8,0.963663,0.224414,0.159469
9,0.383442,0.953676,0.41703


### Step 3: Define Target Variable
Using the given function ( y = x1^2 + 3 . x2 + c), we calculate the target variable `y`, creating a non-linear relationship between the features.


In [3]:
y = x1**2 + 3*x2 + c
df['y'] = y

df.head(10)

Unnamed: 0,x1,x2,c,y
0,0.548814,0.59288,0.811518,2.891356
1,0.715189,0.010064,0.476084,1.017771
2,0.602763,0.475826,0.523156,2.313958
3,0.544883,0.70877,0.250521,2.673729
4,0.423655,0.043975,0.605043,0.916453
5,0.645894,0.879521,0.302905,3.358648
6,0.437587,0.520081,0.577284,2.329011
7,0.891773,0.030661,0.169678,1.05692
8,0.963663,0.224414,0.159469,1.761356
9,0.383442,0.953676,0.41703,3.425084


### Step 4: Prepare Polynomial Features
We combine `x1` and `x2` into a single feature matrix `X`. Using `PolynomialFeatures(degree=2)`, we generate polynomial terms up to the second degree, enabling our model to capture non-linear patterns.


In [4]:
X = np.column_stack((x1, x2))
poly = PolynomialFeatures(degree=2) 
X_poly = poly.fit_transform(X)

### Step 5: Split Data into Train and Test Sets
We split the polynomial-transformed data into training and testing sets using an 80-20 split, which allows us to evaluate model performance on unseen data.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2)

### Step 6: Train the Model
We initialize a `LinearRegression` model and fit it on the training data. This model, with polynomial features, can now capture the non-linear relationship in the data.

In [6]:
model = LinearRegression()
model.fit(X_train, y_train)

### Step 7: Predict and Evaluate Model
We use the trained model to predict `y` values on the test set. We then calculate the R-squared score to evaluate how well the model captures the variance in the data. A higher score indicates a better fit.

In [7]:
y_pred = model.predict(X_test)  
score = r2_score(y_test, y_pred)  

comparison_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})

print(f"\nR-squared Score for Polynomial Regression: {score}")
comparison_df.head(10)



R-squared Score for Polynomial Regression: 0.9115183706757247


Unnamed: 0,Actual,Predicted
0,1.773836,2.01696
1,1.438244,1.409184
2,1.510336,1.329384
3,3.366689,3.043347
4,1.017771,1.015664
5,3.673129,3.407609
6,1.336461,1.319386
7,1.631265,1.61387
8,1.468211,1.281601
9,2.97743,2.735278
