# Task
Load the dataset "My Drive/customer_data.csv" from Google Drive into a dataframe and display the first 5 rows and the columns and their types.

## Mount google drive

### Subtask:
Mount your Google Drive to access files stored there.


**Reasoning**:
Mount Google Drive to access the dataset.



In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The error `NameError: name 'content' is not defined` in cell `YpR3o5d8j64e` occurs because `/content/drive/MyDrive/` is a file path, not valid Python code that can be directly executed.

If you intended to list the contents of that directory, you can use a shell command by prefixing the line with `!`.

In [7]:
!ls /content/drive/MyDrive/

 23STUCHH011171_Pranav-Jaina_Viswam-AI_Report.docx
 23STUCHH011179_Bhanuri_Suchitra_Viswam-AI_IPReport.docx
'23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report[1] (1).docx'
'23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report[1].docx'
 23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report.docx
 2608251516485527.gdoc
'3-02-25 (1).rtf.gdoc'
 3-02-25.rtf.gdoc
 Assignment.gdoc
'Colab Notebooks'
'Copy of 23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report.docx'
 DS305_Labexam_Sets15oct2025.gdoc
'DWV ASSIGNMENT 1179.docx'
'DWV ASSIGNMENT.gdoc'
 IMG_20210618_130307.jpg
'koushik resume.pdf'
'Report IP1 (1).gdoc'
'Report IP1.gdoc'
'Report IP1.pdf'
 ReportSuchitra.docx
 Snapchat-1842695788.jpg
 Test.csv
 Train.csv
 VISWAM_AI_Seminar_Presentation_11179.pptx
 wordpress.gdoc


In [8]:
!ls /content/drive/MyDrive/Train.csv

/content/drive/MyDrive/Train.csv


In [9]:
# --- Step 1: Import libraries ---
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler

# --- Step 2: Load datasets ---
train_path = '/content/drive/MyDrive/Train.csv'  # Update if your file is in another folder
test_path = '/content/drive/MyDrive/Test.csv'

train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)

print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)
print("\nPreview of Train data:")
print(train_df.head())

# --- Step 3: Handle missing values ---
# Fill numeric missing values with mean
train_df = train_df.fillna(train_df.mean(numeric_only=True))
test_df = test_df.fillna(test_df.mean(numeric_only=True))

# --- Step 4: Encode categorical columns ---
label_enc = LabelEncoder()

# Find categorical columns (object/string type)
cat_cols = train_df.select_dtypes(include=['object']).columns

for col in cat_cols:
    # Combine train + test for consistent encoding
    combined = pd.concat([train_df[col], test_df[col]], axis=0)
    label_enc.fit(combined.astype(str))
    train_df[col] = label_enc.transform(train_df[col].astype(str))
    test_df[col] = label_enc.transform(test_df[col].astype(str))

# --- Step 5: Separate features and target ---
target = 'Power'

X_train = train_df.drop(columns=[target])
y_train = train_df[target]

# Note: Test data usually has no 'Power' column
if 'Power' in test_df.columns:
    X_test = test_df.drop(columns=[target])
    y_test = test_df[target]
else:
    X_test = test_df.copy()
    y_test = None

# --- Step 6: Scale numeric data ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("\nâœ… Data preprocessing completed successfully!")
print("Training features shape:", X_train_scaled.shape)
print("Test features shape:", X_test_scaled.shape)


Train shape: (140160, 12)
Test shape: (35040, 11)

Preview of Train data:
   Unnamed: 0              Time  Location  Temp_2m  RelHum_2m      DP_2m  \
0           0  02-01-2013 00:00         1  28.2796  84.664205  24.072595   
1           1  02-01-2013 01:00         1  28.1796  85.664205  24.272595   
2           2  02-01-2013 02:00         1  26.5796  90.664205  24.072595   
3           3  02-01-2013 03:00         1  27.1796  87.664205  23.872595   
4           4  02-01-2013 04:00         1  27.0796  87.664205  23.672595   

     WS_10m   WS_100m      WD_10m     WD_100m    WG_10m     Power  
0  1.605389  1.267799  145.051683  161.057315  1.336515  0.163496  
1  2.225389  3.997799  150.051683  157.057315  4.336515  0.142396  
2  1.465389  2.787799  147.051683  149.057315  3.136515  0.121396  
3  1.465389  2.697799   57.051683  104.057315  1.536515  0.100296  
4  2.635389  4.437799   57.051683   83.057315  3.936515  0.079296  

âœ… Data preprocessing completed successfully!
Training feat

In [10]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

In [11]:
model = LinearRegression()
model.fit(X_train_scaled, y_train)

print("âœ… Model training complete!")


âœ… Model training complete!


In [12]:
y_pred_test = model.predict(X_test_scaled)
print("Predictions on Test data:")
print(y_pred_test[:10])


Predictions on Test data:
[-0.04979993  0.00850635  0.05840822  0.10189542  0.12253005  0.09845246
  0.07454005  0.09562673  0.20506185  0.27197684]


In [18]:
if y_test is not None:
    y_pred = model.predict(X_test_scaled)
    mae = mean_absolute_error(y_test, y_pred)
    mse = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test, y_pred)

    print("\nðŸ“Š Evaluation Results:")
    print("MAE :", mae)
    print("RMSE:", rmse)
    print("RÂ² Score:", r2)


In [21]:
# --- Step 1: Import libraries ---
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression

from google.colab import drive
drive.mount('/content/drive')

train_path = '/content/drive/MyDrive/Train.csv'  # Update if your file is in another folder
test_path = '/content/drive/MyDrive/Test.csv'

train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)


print("âœ… Data Loaded Successfully!")
print("Train Shape:", train_df.shape)
print("Test Shape:", test_df.shape)

# --- Step 3: Handle missing values ---
train_df = train_df.fillna(train_df.mean(numeric_only=True))
test_df = test_df.fillna(test_df.mean(numeric_only=True))

# --- Step 4: Encode categorical columns ---
label_enc = LabelEncoder()
for col in train_df.select_dtypes(include=['object']).columns:
    combined = pd.concat([train_df[col], test_df[col]], axis=0)
    label_enc.fit(combined.astype(str))
    train_df[col] = label_enc.transform(train_df[col].astype(str))
    test_df[col] = label_enc.transform(test_df[col].astype(str))

# --- Step 5: Separate features and target ---
target = "Power"
X_train = train_df.drop(columns=[target])
y_train = train_df[target]
X_test = test_df.copy()  # test has no target column

# --- Step 6: Scale features ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --- Step 7: Train model ---
model = LinearRegression()
model.fit(X_train_scaled, y_train)
print("âœ… Model Trained Successfully!")

# --- Step 8: Predict on test data ---
y_pred = model.predict(X_test_scaled)
print("\nâœ… Predictions generated successfully!")
print("First 10 predicted Power values:")
print(y_pred[:10])

# --- Step 9: Prepare output file ---
# If test data has an ID column, include it
id_columns = [col for col in test_df.columns if col.lower() in ['id', 'index']]
if id_columns:
    output = pd.DataFrame({
        id_columns[0]: test_df[id_columns[0]],
        'Predicted_Power': y_pred
    })
else:
    # If no ID column, create one
    output = pd.DataFrame({
        'ID': range(1, len(y_pred) + 1),
        'Predicted_Power': y_pred
    })

# --- Step 10: Save predictions ---
output.to_csv('Predicted_Power.csv', index=False)
print("\nðŸ’¾ Predictions saved to 'Predicted_Power.csv'")
print("\nPreview of saved file:")
print(output.head())


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
âœ… Data Loaded Successfully!
Train Shape: (140160, 12)
Test Shape: (35040, 11)
âœ… Model Trained Successfully!

âœ… Predictions generated successfully!
First 10 predicted Power values:
[-0.04979993  0.00850635  0.05840822  0.10189542  0.12253005  0.09845246
  0.07454005  0.09562673  0.20506185  0.27197684]

ðŸ’¾ Predictions saved to 'Predicted_Power.csv'

Preview of saved file:
   ID  Predicted_Power
0   1        -0.049800
1   2         0.008506
2   3         0.058408
3   4         0.101895
4   5         0.122530


In [26]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

rf = RandomForestRegressor(
    n_estimators=300,
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=42,
    n_jobs=-1
)

rf.fit(X_train_split, y_train_split)
y_val_pred = rf.predict(X_val_split)

mae = mean_absolute_error(y_val_split, y_val_pred)
rmse = np.sqrt(mean_squared_error(y_val_split, y_val_pred))
r2 = r2_score(y_val_split, y_val_pred)

print("\nModel Accuracy on Validation Data:")
print(f"MAE  : {mae:.4f}")
print(f"RMSE : {rmse:.4f}")
print(f"RÂ²   : {r2:.4f}")



Model Accuracy on Validation Data:
MAE  : 0.0737
RMSE : 0.1054
RÂ²   : 0.8272
