# Importing data for supervised learning

#### EXERCISE:
In this chapter, you will work with <a href="https://www.gapminder.org/data/" target="_blank" rel="noopener noreferrer">Gapminder</a> data that we have consolidated into one CSV file available in the workspace as <code>'gapminder.csv'</code>. Specifically, your goal will be to use this data to predict the life expectancy in a given country based on features such as the country's GDP, fertility rate, and population. As in Chapter 1, the dataset has been preprocessed.

Since the target variable here is quantitative, this is a regression problem. To begin, you will fit a linear regression with just one feature: <code>'fertility'</code>, which is the average number of children a woman in a given country gives birth to. In later exercises, you will use all the features to build regression models.

Before that, however, you need to import the data and get it into the form needed by scikit-learn. This involves creating feature and target variable arrays. Furthermore, since you are going to use only one feature to begin with, you need to do some reshaping using NumPy's <code>.reshape()</code> method. Don't worry too much about this reshaping right now, but it is something you will have to do occasionally when working with scikit-learn so it is useful to practice.

#### INSTRUCTIONS:
* Import <code>numpy</code> and <code>pandas</code> as their standard aliases.
* Read the file <code>'gapminder.csv'</code> into a DataFrame <code>df</code> using the <code>read_csv()</code> function.
* Create array <code>X</code> for the <code>'fertility'</code> feature  and array <code>y</code> for the <code>'life'</code> target variable.
* Reshape the arrays by using the <code>.reshape()</code> method and passing in <code>-1</code> and <code>1</code>.

#### SCRIPT.PY:

In [8]:
# Import numpy and pandas
import numpy as np
import pandas as pd

# Read the CSV file into a DataFrame: df
df = pd.read_csv("gapminder.csv")

# Create arrays for features and target variable
y = df["life"].values
X = df["fertility"].values

# Print the dimensions of X and y before reshaping
print("Dimensions of y before reshaping: {}".format(y.shape))
print("Dimensions of X before reshaping: {}".format(X.shape))

# Reshape X and y
y = y.reshape(-1, 1)
X = X.reshape(-1, 1)

# Print the dimensions of X and y after reshaping
print("Dimensions of y after reshaping: {}".format(y.shape))
print("Dimensions of X after reshaping: {}".format(X.shape))

Dimensions of y before reshaping: (139,)
Dimensions of X before reshaping: (139,)
Dimensions of y after reshaping: (139, 1)
Dimensions of X after reshaping: (139, 1)
