In [1]:
import numpy as np
from sklearn import preprocessing
from sklearn.datasets import fetch_california_housing

# create the DataFrame
california_housing = fetch_california_housing(as_frame=True)

# print the dataset description
print(california_housing.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block group
        - HouseAge      median house age in block group
        - AveRooms      average number of rooms per household
        - AveBedrms     average number of bedrooms per household
        - Population    block group population
        - AveOccup      average number of household members
        - Latitude      block group latitude
        - Longitude     block group longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived

### Normalizing Columns from a DataFrame Using the normalize() Function

In [2]:
x_array = np.array(california_housing.data['HouseAge'])
print("HouseAge array: ",x_array)

normalized_arr = preprocessing.normalize([x_array])
print("Normalized HouseAge array: ",normalized_arr)

HouseAge array:  [41. 21. 52. ... 17. 18. 16.]
Normalized HouseAge array:  [[0.00912272 0.00467261 0.01157028 ... 0.00378259 0.0040051  0.00356009]]


### Normalizing Datasets by Row or by Column Using the normalize() Function

In [3]:
import pandas as pd

california_housing = fetch_california_housing(as_frame=True)

d = preprocessing.normalize(california_housing.data, axis=0)
scaled_df = pd.DataFrame(d, columns=california_housing.data.columns)
print(scaled_df)

         MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  \
0      0.013440  0.009123  0.008148   0.005965    0.001231  0.001642   
1      0.013401  0.004673  0.007278   0.005662    0.009180  0.001356   
2      0.011716  0.011570  0.009670   0.006254    0.001896  0.001801   
3      0.009110  0.011570  0.006787   0.006252    0.002133  0.001638   
4      0.006209  0.011570  0.007329   0.006299    0.002160  0.001402   
...         ...       ...       ...        ...         ...       ...   
20635  0.002519  0.005563  0.005886   0.006603    0.003231  0.001646   
20636  0.004128  0.004005  0.007133   0.007666    0.001361  0.002007   
20637  0.002744  0.003783  0.006073   0.006526    0.003850  0.001495   
20638  0.003014  0.004005  0.006218   0.006828    0.002833  0.001365   
20639  0.003856  0.003560  0.006131   0.006772    0.005303  0.001682   

       Latitude  Longitude  
0      0.007386  -0.007114  
1      0.007383  -0.007114  
2      0.007381  -0.007115  
3      0.007381  -0

Hasil normalisasi kolom "HouseAge" sama seperti kode diatas

### Using the scikit-learn preprocessing.MinMaxScaler() Function to Normalize Data

In [4]:
scaler = preprocessing.MinMaxScaler()
#atau bisa ditentukan min dan max-nya berapa
# scaler = preprocessing.MinMaxScaler(feature_range=(0, 2))

d = scaler.fit_transform(california_housing.data)
scaled_df = pd.DataFrame(d, columns=california_housing.data.columns)
print(scaled_df)

         MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  \
0      0.539668  0.784314  0.043512   0.020469    0.008941  0.001499   
1      0.538027  0.392157  0.038224   0.018929    0.067210  0.001141   
2      0.466028  1.000000  0.052756   0.021940    0.013818  0.001698   
3      0.354699  1.000000  0.035241   0.021929    0.015555  0.001493   
4      0.230776  1.000000  0.038534   0.022166    0.015752  0.001198   
...         ...       ...       ...        ...         ...       ...   
20635  0.073130  0.470588  0.029769   0.023715    0.023599  0.001503   
20636  0.141853  0.333333  0.037344   0.029124    0.009894  0.001956   
20637  0.082764  0.313725  0.030904   0.023323    0.028140  0.001314   
20638  0.094295  0.333333  0.031783   0.024859    0.020684  0.001152   
20639  0.130253  0.294118  0.031252   0.024573    0.038790  0.001549   

       Latitude  Longitude  
0      0.567481   0.211155  
1      0.565356   0.212151  
2      0.564293   0.210159  
3      0.564293   0