# Task
Load the dataset "My Drive/customer_data.csv" from Google Drive into a dataframe and display the first 5 rows and the columns and their types.

## Mount google drive

### Subtask:
Mount your Google Drive to access files stored there.


**Reasoning**:
Mount Google Drive to access the dataset.



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The error `NameError: name 'content' is not defined` in cell `YpR3o5d8j64e` occurs because `/content/drive/MyDrive/` is a file path, not valid Python code that can be directly executed.

If you intended to list the contents of that directory, you can use a shell command by prefixing the line with `!`.

In [14]:
!ls /content/drive/MyDrive/

 23STUCHH011171_Pranav-Jaina_Viswam-AI_Report.docx
 23STUCHH011179_Bhanuri_Suchitra_Viswam-AI_IPReport.docx
'23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report[1] (1).docx'
'23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report[1].docx'
 23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report.docx
 2608251516485527.gdoc
'3-02-25 (1).rtf.gdoc'
 3-02-25.rtf.gdoc
 Assignment.gdoc
'Colab Notebooks'
'Copy of 23STUCHH011179_PBhanuri_Suchitra_Viswam-AI_Report.docx'
 DS305_Labexam_Sets15oct2025.gdoc
'DWV ASSIGNMENT 1179.docx'
'DWV ASSIGNMENT.gdoc'
 IMG_20210618_130307.jpg
'koushik resume.pdf'
'Report IP1 (1).gdoc'
'Report IP1.gdoc'
'Report IP1.pdf'
 ReportSuchitra.docx
 Snapchat-1842695788.jpg
 Train.csv
 VISWAM_AI_Seminar_Presentation_11179.pptx
 wordpress.gdoc


In [15]:
!ls /content/drive/MyDrive/Train.csv

/content/drive/MyDrive/Train.csv


In [21]:
# --- Step 1: Import libraries ---
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler

# --- Step 2: Load datasets ---
train_path = '/content/drive/MyDrive/Train.csv'  # Update if your file is in another folder
test_path = '/content/drive/MyDrive/Test.csv'

train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)

print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)
print("\nPreview of Train data:")
print(train_df.head())

# --- Step 3: Handle missing values ---
# Fill numeric missing values with mean
train_df = train_df.fillna(train_df.mean(numeric_only=True))
test_df = test_df.fillna(test_df.mean(numeric_only=True))

# --- Step 4: Encode categorical columns ---
label_enc = LabelEncoder()

# Find categorical columns (object/string type)
cat_cols = train_df.select_dtypes(include=['object']).columns

for col in cat_cols:
    # Combine train + test for consistent encoding
    combined = pd.concat([train_df[col], test_df[col]], axis=0)
    label_enc.fit(combined.astype(str))
    train_df[col] = label_enc.transform(train_df[col].astype(str))
    test_df[col] = label_enc.transform(test_df[col].astype(str))

# --- Step 5: Separate features and target ---
target = 'Power'

X_train = train_df.drop(columns=[target])
y_train = train_df[target]

# Note: Test data usually has no 'Power' column
if 'Power' in test_df.columns:
    X_test = test_df.drop(columns=[target])
    y_test = test_df[target]
else:
    X_test = test_df.copy()
    y_test = None

# --- Step 6: Scale numeric data ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("\n✅ Data preprocessing completed successfully!")
print("Training features shape:", X_train_scaled.shape)
print("Test features shape:", X_test_scaled.shape)


Train shape: (140160, 12)
Test shape: (35040, 11)

Preview of Train data:
   Unnamed: 0              Time  Location  Temp_2m  RelHum_2m      DP_2m  \
0           0  02-01-2013 00:00         1  28.2796  84.664205  24.072595   
1           1  02-01-2013 01:00         1  28.1796  85.664205  24.272595   
2           2  02-01-2013 02:00         1  26.5796  90.664205  24.072595   
3           3  02-01-2013 03:00         1  27.1796  87.664205  23.872595   
4           4  02-01-2013 04:00         1  27.0796  87.664205  23.672595   

     WS_10m   WS_100m      WD_10m     WD_100m    WG_10m     Power  
0  1.605389  1.267799  145.051683  161.057315  1.336515  0.163496  
1  2.225389  3.997799  150.051683  157.057315  4.336515  0.142396  
2  1.465389  2.787799  147.051683  149.057315  3.136515  0.121396  
3  1.465389  2.697799   57.051683  104.057315  1.536515  0.100296  
4  2.635389  4.437799   57.051683   83.057315  3.936515  0.079296  

✅ Data preprocessing completed successfully!
Training featur