## Quick peek at your data
This tutorial uses a version of the Salads dataset that is stored in a public Cloud Storage bucket, using a CSV index file.

Start by doing a quick peek at the data. You count the number of examples by counting the number of rows in the CSV index file (wc -l) and then peek at the first few rows.

```python
count = ! gsutil cat $IMPORT_FILE | wc -l
print("Number of Examples", int(count[0]))

print("First 10 rows")
! gsutil cat $IMPORT_FILE | head -10
```

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the data
X_train_full = pd.read_csv('../input/train.csv', index_col='Id')
X_test_full = pd.read_csv('../input/test.csv', index_col='Id')

# Remove rows with missing target, separate target from predictors
X_train_full.dropna(axis=0, subset=['SalePrice'], inplace=True)

# set the prediction value we're after in the Y axis. 
y = X_train_full.SalePrice

# remove missing targets in the training data
X_train_full.drop(['SalePrice'], axis=1, inplace=True)

# To keep things simple, we'll use only numerical predictors
X = X_train_full.select_dtypes(exclude=['object'])
X_test = X_test_full.select_dtypes(exclude=['object'])

# Break off validation set from training data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2,
                                                      random_state=0)
                                    
# Use the next code cell to print the first five rows of the data.
X_train_full.head()

# Shape of training data (num_rows, num_columns)
print(X_train_full.shape)

# Number of missing values in each column of training data
missing_val_count_by_column = (X_train_full.isnull().sum())
print(missing_val_count_by_column[missing_val_count_by_column > 0])