###### Label Encoding Basic Example One

##### 🎯 What is Label Encoding?

→ Label Encoding converts text (categorical data) into numbers.
Because machine learning models only understand numbers, not words.

Example:
| Gender |
|---------|
| Male    |
| Female  |
| Female  |
| Male    |

After Label Encoding:
| Gender |
|---------|
| 1 |
| 0 |
| 0 |
| 1 |

---

⚙️ Function Used: LabelEncoder()

It comes from sklearn.preprocessing.

Example:
    from sklearn.preprocessing import LabelEncoder
    import pandas as pd

    # Sample data
    df = pd.DataFrame({
        'Gender': ['Male', 'Female', 'Female', 'Male']
    })

    # Create the encoder
    encoder = LabelEncoder()

    # Apply encoding
    df['Gender'] = encoder.fit_transform(df['Gender'])

    print(df)

Output:
| Gender |
|---------|
| 1 |
| 0 |
| 0 |
| 1 |

---

💡 How It Works:
- fit() → learns the unique categories (e.g., "Male", "Female")
- transform() → converts them to numbers
- fit_transform() → does both at once

---

⚠️ Note:
Label encoding assigns numbers in **alphabetical order**.
Example:
    "Female" → 0
    "Male"   → 1

Do not use label encoding when categories have an order meaning (like “Low”, “Medium”, “High”). 
Use One-Hot Encoding instead for those cases.

In [1]:
from sklearn.preprocessing import LabelEncoder

In [2]:
colors=['Pink','Skyblue','Teal','Beige','Lilac','Mint','Nude']
le=LabelEncoder()

# Fit & Transform()
encoded=le.fit_transform(colors)
print(encoded)

[4 5 6 0 1 2 3]


In [3]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

In [6]:
data=pd.DataFrame({'Color':['Pink','Skyblue','Teal','Beige','Pink']})
# Create Encoder
ohe=OneHotEncoder(sparse_output=False)

# Fit & Transform
encoded=ohe.fit_transform(data[['Color']])
print()
encoded_df=pd.DataFrame(encoded,columns=ohe.get_feature_names_out(['Color']))
print(encoded_df)


   Color_Beige  Color_Pink  Color_Skyblue  Color_Teal
0          0.0         1.0            0.0         0.0
1          0.0         0.0            1.0         0.0
2          0.0         0.0            0.0         1.0
3          1.0         0.0            0.0         0.0
4          0.0         1.0            0.0         0.0


In [8]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

data=pd.DataFrame({'Satisfaction':['Low','Medium','High','Low']})
le=LabelEncoder()
print()
data['Satisfaction_Encoded']=le.fit_transform(data['Satisfaction'])
print(data)


  Satisfaction  Satisfaction_Encoded
0          Low                     1
1       Medium                     2
2         High                     0
3          Low                     1


In [11]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

data=pd.DataFrame({'Region':['North','South','East','West','South']})
ohe=OneHotEncoder(sparse_output=False)
encoded=ohe.fit_transform(data[['Region']])
print()
encoded_df=pd.DataFrame(encoded,columns=ohe.get_feature_names_out(['Region']))
print(encoded_df)


   Region_East  Region_North  Region_South  Region_West
0          0.0           1.0           0.0          0.0
1          0.0           0.0           1.0          0.0
2          1.0           0.0           0.0          0.0
3          0.0           0.0           0.0          1.0
4          0.0           0.0           1.0          0.0


##### Linear Regression: Find the Best Fitting Straight Line

##### Steps:
1. Collect Data
2. Create the Model for LinearRegression
3. Train the Model fit(X,y)
4. Predict the Result predict([[value_to_predict]])
5. Visualize Using Plot of Best Fitted (Optional)

In [12]:
import numpy as np
from sklearn.linear_model import LinearRegression

In [13]:
# Step 1
ours_studied=np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
marks_obtained=np.array([20,25,30,35,50,60,70,80])

In [14]:
# Step 2
model=LinearRegression()

In [15]:
# Step 3
model.fit(X=hours_studied,y=marks_obtained)

In [16]:
# Step 4
print()
predicted_marks=model.predict([[9]])
print('Predicted Marks for 9 Hours: \t ',predicted_marks)


Predicted Marks for 9 Hours: 	  [86.42857143]


In [24]:
print()
hour_study=float(input(f'Number of Hours Study to Predict Marks: \t '))
p_marks=model.predict([[hour_study]])
print(f'Predict Marks for {hour_study} Hours is Likely: \t {p_marks}')




Number of Hours Study to Predict Marks: 	  8.5


Predict Marks for 8.5 Hours is Likely: 	 [81.96428571]
