# Работа с данными в Python

Данные обычно хранятся в виде **таблиц** (MS Excel, базы данных SQL, Hadoop MapReduce и т.д.)

Примеры операций с данными:
1. чтение и запись
2. просмотр (столбцы, строки, названия) и индексация
3. статистика, агрегатные функции
4. расчет новых значений
5. добавление/изменение/удаление строк и столбцов
6. поиск и фильтрация
7. сортировка
8. визуализация
9. переименование строк/столбцов
10. объединение нескольких таблиц
11. группировка значений и агрегация

## Формат CSV

**C**omma **S**eparated **V**alues – "значения, разделенные запятыми" (или другим разделителем).

В качестве примера рассмотрим набор данных с подробным описанием жилых домов в штате Айова, США (источник: соревнование [House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) на платформе [Kaggle](https://www.kaggle.com)).

In [1]:
with open('train.csv', 'r') as datafile:
    data_lines = datafile.read().splitlines()

In [2]:
print data_lines[0]

Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice


In [5]:
print data_lines[3]

3,60,RL,68,11250,Pave,NA,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,162,Gd,TA,PConc,Gd,TA,Mn,GLQ,486,Unf,0,434,920,GasA,Ex,Y,SBrkr,920,866,0,1786,1,0,2,1,3,1,Gd,6,Typ,1,TA,Attchd,2001,RFn,2,608,TA,TA,Y,0,42,0,0,0,0,NA,NA,NA,0,9,2008,WD,Normal,223500


## Библиотека Pandas

[Pandas](http://pandas.pydata.org) не является частью языка Python. В Anaconda уже установлена, а в обычном Python нужно установить:

In [6]:
!pip install pandas



In [8]:
import pandas as pd

In [9]:
print pd.__version__

0.20.2


Важные "сущности":
- `pd.DataFrame` – таблица (двумерные данные)
- `pd.Series` – столбец/строка (одномерные данные)
- `pd.Index` – индекс (список названий строк/столбцов)

## Чтение/запись таблицы из файла CSV

Важные параметры:
- `sep=','` – разделитель (бывает `'\t'`, `';'`, `' '` и т.д.)
- `decimal='.'` – символ, отделяющий дробную часть (в русской локали бывает `','`)
- `encoding='utf8'` – кодировка (в Windows часто бывает `'cp1251'`)

In [14]:
df = pd.read_csv('train.csv', nrows=10)

Дополнительный список значений, которые будут интерпретироваться как пропущенные, можно добавить с помощью параметра `na_values`: например, `na_values=['-', 'missing']`, чтобы прочерки и строка `missing` тоже превращались в `NaN`.

In [15]:
type(df)

pandas.core.frame.DataFrame

In [16]:
df.to_csv('train_head.csv', sep=';')

## Просмотр таблицы и индексация

In [17]:
df = pd.read_csv('train.csv', nrows=100, index_col='Id')

### Часть таблицы

In [24]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


In [23]:
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 30)

In [26]:
df.head(3)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500


In [28]:
df.tail(2)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,352,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950


In [30]:
df.sample(5)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
68,20,RL,72.0,10665,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,144,29,0,0,0,0,,,,0,6,2007,WD,Normal,226000
31,70,C (all),50.0,8500,Pave,Pave,Reg,Lvl,AllPub,Inside,Gtl,IDOTRR,Feedr,Norm,1Fam,...,0,54,172,0,0,0,,MnPrv,,0,7,2008,WD,Normal,40000
82,120,RM,32.0,4500,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Mitchel,Norm,Norm,TwnhsE,...,0,199,0,0,0,0,,,,0,3,2006,WD,Normal,153500
33,20,RL,85.0,11049,Pave,,Reg,Lvl,AllPub,Corner,Gtl,CollgCr,Norm,Norm,1Fam,...,0,30,0,0,0,0,,,,0,1,2008,WD,Normal,179900
78,50,RM,50.0,8635,Pave,,Reg,Lvl,AllPub,Inside,Gtl,BrkSide,Norm,Norm,1Fam,...,0,0,0,0,0,0,,MnPrv,,0,1,2008,WD,Normal,127000


### Индекс и названия столбцов

In [31]:
df.shape

(100, 80)

In [32]:
df.index

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
             14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
             27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
             40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
             53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
             66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
             79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
             92,  93,  94,  95,  96,  97,  98,  99, 100],
           dtype='int64', name=u'Id')

In [33]:
df.columns

Index([u'MSSubClass', u'MSZoning', u'LotFrontage', u'LotArea', u'Street',
       u'Alley', u'LotShape', u'LandContour', u'Utilities', u'LotConfig',
       u'LandSlope', u'Neighborhood', u'Condition1', u'Condition2',
       u'BldgType', u'HouseStyle', u'OverallQual', u'OverallCond',
       u'YearBuilt', u'YearRemodAdd', u'RoofStyle', u'RoofMatl',
       u'Exterior1st', u'Exterior2nd', u'MasVnrType', u'MasVnrArea',
       u'ExterQual', u'ExterCond', u'Foundation', u'BsmtQual', u'BsmtCond',
       u'BsmtExposure', u'BsmtFinType1', u'BsmtFinSF1', u'BsmtFinType2',
       u'BsmtFinSF2', u'BsmtUnfSF', u'TotalBsmtSF', u'Heating', u'HeatingQC',
       u'CentralAir', u'Electrical', u'1stFlrSF', u'2ndFlrSF', u'LowQualFinSF',
       u'GrLivArea', u'BsmtFullBath', u'BsmtHalfBath', u'FullBath',
       u'HalfBath', u'BedroomAbvGr', u'KitchenAbvGr', u'KitchenQual',
       u'TotRmsAbvGrd', u'Functional', u'Fireplaces', u'FireplaceQu',
       u'GarageType', u'GarageYrBlt', u'GarageFinish', u'GarageCars'

In [208]:
df.sample(5).index

Index([3, 76, 23, 85, 21], dtype='object', name=u'Id')

In [36]:
df.T

Id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,...,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
MSSubClass,60,20,60,70,60,50,20,60,50,190,20,60,20,20,20,...,60,60,160,50,20,20,20,30,190,60,60,20,20,30,20
MSZoning,RL,RL,RL,RL,RL,RL,RL,RL,RM,RL,RL,RL,RL,RL,RL,...,RL,RL,FV,C (all),RL,RL,RL,RL,C (all),RL,RL,RL,RL,RL,RL
LotFrontage,65,80,68,60,84,85,75,,51,50,70,85,,91,,...,121,122,40,105,60,60,85,80,60,69,,78,73,85,77
LotArea,8450,9600,11250,9550,14260,14115,10084,10382,6120,7420,11200,11924,12968,10652,10920,...,16059,11911,3951,8470,8070,7200,8500,13360,7200,9337,9765,10264,10921,10625,9320
Street,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,...,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave,Pave
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MoSold,2,5,9,2,12,10,8,11,4,1,2,7,9,8,5,...,4,3,6,10,8,7,12,8,11,5,4,8,5,5,1
YrSold,2008,2007,2008,2006,2008,2009,2007,2009,2008,2008,2008,2006,2008,2007,2008,...,2006,2009,2009,2009,2007,2006,2006,2009,2007,2007,2009,2006,2007,2010,2010
SaleType,WD,WD,WD,WD,WD,WD,WD,WD,WD,WD,WD,New,WD,New,WD,...,WD,WD,New,ConLD,WD,WD,WD,WD,WD,WD,WD,WD,WD,COD,WD
SaleCondition,Normal,Normal,Normal,Abnorml,Normal,Normal,Normal,Normal,Abnorml,Normal,Normal,Partial,Normal,Partial,Normal,...,Normal,Normal,Partial,Abnorml,Normal,Normal,Abnorml,Normal,Normal,Normal,Normal,Normal,Normal,Abnorml,Normal


In [37]:
print df.T.shape
print df.T.columns

(80, 100)
Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
             14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
             27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
             40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
             53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
             66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
             79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
             92,  93,  94,  95,  96,  97,  98,  99, 100],
           dtype='int64', name=u'Id')


### Базовая индексация

[Официальная документация](https://pandas.pydata.org/pandas-docs/stable/indexing.html)

#### Столбцы

In [38]:
df['SalePrice']

Id
1      208500
2      181500
3      223500
4      140000
5      250000
        ...  
96     185000
97     214000
98      94750
99      83000
100    128950
Name: SalePrice, Length: 100, dtype: int64

In [39]:
type(df['SalePrice'])

pandas.core.series.Series

In [40]:
df['SalePrice'].index

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
             14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
             27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
             40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
             53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
             66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
             79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
             92,  93,  94,  95,  96,  97,  98,  99, 100],
           dtype='int64', name=u'Id')

In [41]:
df['SalePrice'].columns

AttributeError: 'Series' object has no attribute 'columns'

In [42]:
df.SalePrice

Id
1      208500
2      181500
3      223500
4      140000
5      250000
        ...  
96     185000
97     214000
98      94750
99      83000
100    128950
Name: SalePrice, Length: 100, dtype: int64

In [43]:
df[['SaleType', 'SalePrice']]

Unnamed: 0_level_0,SaleType,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,WD,208500
2,WD,181500
3,WD,223500
4,WD,140000
5,WD,250000
...,...,...
96,WD,185000
97,WD,214000
98,WD,94750
99,COD,83000


Вопрос: чем отличается `df['SalePrice']` от `df[['SalePrice']]`?

In [210]:
print type(df['SalePrice'])
print df['SalePrice'].shape

<class 'pandas.core.series.Series'>
(100,)


In [211]:
type(df[['SalePrice']])
print df[['SalePrice']].shape

(100, 1)


#### Строки

In [50]:
df.loc[1]

MSSubClass           60
MSZoning             RL
LotFrontage          65
LotArea            8450
Street             Pave
                  ...  
MoSold                2
YrSold             2008
SaleType             WD
SaleCondition    Normal
SalePrice        208500
Name: 1, Length: 80, dtype: object

In [51]:
type(df.loc[1])

pandas.core.series.Series

In [52]:
df.loc[1].index

Index([u'MSSubClass', u'MSZoning', u'LotFrontage', u'LotArea', u'Street',
       u'Alley', u'LotShape', u'LandContour', u'Utilities', u'LotConfig',
       u'LandSlope', u'Neighborhood', u'Condition1', u'Condition2',
       u'BldgType', u'HouseStyle', u'OverallQual', u'OverallCond',
       u'YearBuilt', u'YearRemodAdd', u'RoofStyle', u'RoofMatl',
       u'Exterior1st', u'Exterior2nd', u'MasVnrType', u'MasVnrArea',
       u'ExterQual', u'ExterCond', u'Foundation', u'BsmtQual', u'BsmtCond',
       u'BsmtExposure', u'BsmtFinType1', u'BsmtFinSF1', u'BsmtFinType2',
       u'BsmtFinSF2', u'BsmtUnfSF', u'TotalBsmtSF', u'Heating', u'HeatingQC',
       u'CentralAir', u'Electrical', u'1stFlrSF', u'2ndFlrSF', u'LowQualFinSF',
       u'GrLivArea', u'BsmtFullBath', u'BsmtHalfBath', u'FullBath',
       u'HalfBath', u'BedroomAbvGr', u'KitchenAbvGr', u'KitchenQual',
       u'TotRmsAbvGrd', u'Functional', u'Fireplaces', u'FireplaceQu',
       u'GarageType', u'GarageYrBlt', u'GarageFinish', u'GarageCars'

In [53]:
df.loc[1].columns

AttributeError: 'Series' object has no attribute 'columns'

In [54]:
df.loc[[1, 3, 4]]

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000


#### Столбцы и строки

Если оба индекса – значения, то получаем один объект:

In [58]:
df.loc[3, 'YrSold']

2008

Если один из индексов – список значений, то получаем `Series`:

In [61]:
df.loc[[1, 3, 4], 'SalePrice']

Id
1    208500
3    223500
4    140000
Name: SalePrice, dtype: int64

Если оба индекса – списки значений, то получаем `DataFrame`:

In [64]:
df.loc[[1, 3, 4], ['SalePrice', 'YrSold']]

Unnamed: 0_level_0,SalePrice,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,208500,2008
3,223500,2008
4,140000,2006


Вместо списков могут быть срезы (`slice`):

In [65]:
df.loc[1:3, 'YrSold']

Id
1    2008
2    2007
3    2008
Name: YrSold, dtype: int64

In [66]:
df.loc[:5, 'YrSold':'SalePrice']

Unnamed: 0_level_0,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2008,WD,Normal,208500
2,2007,WD,Normal,181500
3,2008,WD,Normal,223500
4,2006,WD,Abnorml,140000
5,2008,WD,Normal,250000


Важное отличие от обычных питоновских срезов: срез берется до правого конца **включительно**.

#### Индексация по номерам

In [68]:
df.iloc[-1]

MSSubClass           20
MSZoning             RL
LotFrontage          77
LotArea            9320
Street             Pave
                  ...  
MoSold                1
YrSold             2010
SaleType             WD
SaleCondition    Normal
SalePrice        128950
Name: 100, Length: 80, dtype: object

In [69]:
df.iloc[0, :4]

MSSubClass       60
MSZoning         RL
LotFrontage      65
LotArea        8450
Name: 1, dtype: object

In [70]:
df.iloc[-3:, -1]

Id
98      94750
99      83000
100    128950
Name: SalePrice, dtype: int64

В этом случае срезы работают как стандартные – исключая правый конец.

#### Резюме

- `df[•]` для столбцов (или `df.•`)
- `df.loc[•]` для строк
- `df.loc[•, •]` для строк и столбцов
- `df.iloc[•, •]` для доступа по номерам

## Информация о таблице

In [71]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 1 to 100
Data columns (total 80 columns):
MSSubClass       100 non-null int64
MSZoning         100 non-null object
LotFrontage      86 non-null float64
LotArea          100 non-null int64
Street           100 non-null object
Alley            6 non-null object
LotShape         100 non-null object
LandContour      100 non-null object
Utilities        100 non-null object
LotConfig        100 non-null object
LandSlope        100 non-null object
Neighborhood     100 non-null object
Condition1       100 non-null object
Condition2       100 non-null object
BldgType         100 non-null object
HouseStyle       100 non-null object
OverallQual      100 non-null int64
OverallCond      100 non-null int64
YearBuilt        100 non-null int64
YearRemodAdd     100 non-null int64
RoofStyle        100 non-null object
RoofMatl         100 non-null object
Exterior1st      100 non-null object
Exterior2nd      100 non-null object
MasVnrType     

In [212]:
df.describe()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,...,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,MiscVal,MoSold,YrSold,SalePrice
count,100.0,86.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,95.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,1.0,100.0,100.0,100.0,100.0
mean,51.1,70.651163,10055.47,5.94,5.45,1951.86,1964.37,121.64,454.56,33.91,558.79,1047.26,1132.01,297.21,8.73,...,0.53,1956.547368,1.75,466.37,93.82,46.9,24.13,7.27,10.16,0.0,0.0,46.3,6.27,1987.82,173000.66
std,42.68501,22.447202,5213.886579,1.631879,1.122542,199.326747,199.504571,213.553982,462.034922,131.575158,448.599196,408.500303,361.935942,436.049768,62.372918,...,0.626921,204.591641,0.757121,199.121082,139.576877,59.743306,60.211019,51.518724,46.908835,0.0,,153.692958,3.113525,200.794502,73739.179035
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,20.0,60.0,7643.25,5.0,5.0,1954.0,1964.75,0.0,0.0,0.0,203.0,822.0,896.25,0.0,0.0,...,0.0,1960.5,1.0,352.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2007.0,129362.5
50%,37.5,70.0,9595.5,6.0,5.0,1970.0,1994.0,0.0,416.0,0.0,440.0,1034.5,1081.5,0.0,0.0,...,0.0,1977.0,2.0,480.0,0.0,30.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,153750.0
75%,60.0,84.0,11243.25,7.0,6.0,2000.25,2003.25,188.5,737.5,0.0,820.0,1268.5,1325.0,686.0,0.0,...,1.0,2002.5,2.0,576.0,149.75,72.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,207750.0
max,190.0,122.0,50271.0,10.0,8.0,2009.0,2009.0,1115.0,1880.0,712.0,1777.0,2223.0,2223.0,1519.0,513.0,...,2.0,2009.0,3.0,894.0,857.0,258.0,272.0,407.0,291.0,0.0,0.0,700.0,12.0,2010.0,438780.0


In [213]:
df.describe()['LotArea']

count      100.000000
mean     10055.470000
std       5213.886579
min          0.000000
25%       7643.250000
50%       9595.500000
75%      11243.250000
max      50271.000000
Name: LotArea, dtype: float64

## Статистика и агрегатные функции

### Статистики для `Series`

Например, для столбцов:

In [75]:
df.LotArea.count()

100

In [76]:
df.LotFrontage.count()

86

In [77]:
df.LotArea.min()

1596

In [78]:
df.LotArea.max()

50271

In [79]:
df.LotArea.mean()

10115.870000000001

In [80]:
df.LotArea.median()

9595.5

In [81]:
df.LotArea.std()

5130.5417629046833

In [85]:
df.LotArea.var()

26322458.780909095

In [83]:
df.LotArea.quantile(0.25)

7643.25

In [88]:
df.LotArea.quantile([0.0, 0.25, 0.5, 0.75, 1.0])

0.00     1596.00
0.25     7643.25
0.50     9595.50
0.75    11243.25
1.00    50271.00
Name: LotArea, dtype: float64

Можно и для строк (и вообще для любых `Series`):

In [90]:
df.loc[1].count()

75

### Статистики для `DataFrame`

Вызывается так же, считается по умолчанию для каждого столбца ("вдоль строк"):

In [91]:
df.count()

MSSubClass       100
MSZoning         100
LotFrontage       86
LotArea          100
Street           100
                ... 
MoSold           100
YrSold           100
SaleType         100
SaleCondition    100
SalePrice        100
Length: 80, dtype: int64

Можно посчитать и для каждой строки ("вдоль столбцов"):

In [92]:
df.count(axis=1)

Id
1      75
2      76
3      76
4      76
5      76
       ..
96     76
97     75
98     75
99     76
100    71
Length: 100, dtype: int64

In [94]:
df[['LotArea', 'MasVnrArea', 'GrLivArea',
    'GarageArea', 'PoolArea']].sum(axis=1)

Id
1      10904
2      11322
3      13806
4      11909
5      17644
       ...  
96     11723
97     12507
98     12361
99     11826
100    10545
Length: 100, dtype: int64

In [96]:
df.min().min()

0

### Уникальные значения

In [214]:
df.Neighborhood.value_counts()

NAmes      21
CollgCr    14
NridgHt     9
OldTown     7
Sawyer      7
           ..
Timber      1
NWAmes      1
ClearCr     1
StoneBr     1
0           1
Name: Neighborhood, Length: 21, dtype: int64

In [102]:
df.Alley.value_counts(dropna=False)

NaN     94
Pave     3
Grvl     3
Name: Alley, dtype: int64

In [103]:
df.Alley.value_counts(dropna=False, normalize=True)

NaN     0.94
Pave    0.03
Grvl    0.03
Name: Alley, dtype: float64

In [106]:
df.Alley.nunique(dropna=False)

3

In [108]:
df.Alley.unique()

array([nan, 'Grvl', 'Pave'], dtype=object)

## Расчет новых значений ("формулы")

### Арифметические операции

Бинарные операции со скаляром (одним значением) работают покомпонентно:

In [111]:
df.LotArea.head()

Id
1     8450
2     9600
3    11250
4     9550
5    14260
Name: LotArea, dtype: int64

In [112]:
(2 * df.LotArea).head()

Id
1    16900
2    19200
3    22500
4    19100
5    28520
Name: LotArea, dtype: int64

In [113]:
df.LotArea - 1000

Id
1       7450
2       8600
3      10250
4       8550
5      13260
       ...  
96      8765
97      9264
98      9921
99      9625
100     8320
Name: LotArea, Length: 100, dtype: int64

In [114]:
(df.LotArea - df.LotArea.mean()) / df.LotArea.std()

Id
1     -0.324697
2     -0.100549
3      0.221055
4     -0.110294
5      0.807737
         ...   
96    -0.068388
97     0.028872
98     0.156929
99     0.099235
100   -0.155124
Name: LotArea, Length: 100, dtype: float64

Те же операции работают и для двух `Series`:

In [115]:
df.SalePrice / df.LotArea

Id
1      24.674556
2      18.906250
3      19.866667
4      14.659686
5      17.531557
         ...    
96     18.945212
97     20.849571
98      8.675945
99      7.811765
100    13.835837
Length: 100, dtype: float64

In [116]:
df.LotArea + df.GarageArea

Id
1       8998
2      10060
3      11858
4      10192
5      15096
       ...  
96     10185
97     10736
98     11353
99     10991
100     9320
Length: 100, dtype: int64

### Проверка условий

In [117]:
df.LotArea > 10000

Id
1      False
2      False
3       True
4      False
5       True
       ...  
96     False
97      True
98      True
99      True
100    False
Name: LotArea, Length: 100, dtype: bool

In [118]:
df.SaleCondition == 'Normal'

Id
1       True
2       True
3       True
4      False
5       True
       ...  
96      True
97      True
98      True
99     False
100     True
Name: SaleCondition, Length: 100, dtype: bool

In [123]:
(df.GarageYrBlt > df.YearBuilt).any()

True

In [124]:
df.Fence.isnull()

Id
1      True
2      True
3      True
4      True
5      True
       ... 
96     True
97     True
98     True
99     True
100    True
Name: Fence, Length: 100, dtype: bool

In [125]:
df.Fence.notnull()

Id
1      False
2      False
3      False
4      False
5      False
       ...  
96     False
97     False
98     False
99     False
100    False
Name: Fence, Length: 100, dtype: bool

In [126]:
df.LotConfig.isin(['Inside', 'Corner'])  # value in [...]

Id
1       True
2      False
3       True
4       True
5      False
       ...  
96      True
97      True
98      True
99      True
100     True
Name: LotConfig, Length: 100, dtype: bool

In [127]:
~df.LotConfig.isin(['Inside', 'Corner'])  # "not"

Id
1      False
2       True
3      False
4      False
5       True
       ...  
96     False
97     False
98     False
99     False
100    False
Name: LotConfig, Length: 100, dtype: bool

In [128]:
(df.SalePrice < 100000) & (df.LotArea > 10000)  # "and"

Id
1      False
2      False
3      False
4      False
5      False
       ...  
96     False
97     False
98      True
99      True
100    False
Length: 100, dtype: bool

In [129]:
(df.SalePrice < 100000) | (df.LotArea > 10000)  # "or"

Id
1      False
2      False
3       True
4      False
5       True
       ...  
96     False
97      True
98      True
99      True
100    False
Length: 100, dtype: bool

### Более сложная математика

In [215]:
df.SalePrice**2

Id
1      43472250000
2      32942250000
3      49952250000
4      19600000000
5      62500000000
          ...     
96     34225000000
97     45796000000
98      8977562500
99      6889000000
100    16628102500
Name: SalePrice, Length: 100, dtype: int64

In [131]:
from math import log

In [132]:
log(df.SalePrice)

TypeError: cannot convert the series to <type 'float'>

На самом деле, `pandas` использует внутри себя более низкоуровневую библиотеку `numpy` для научных вычислений, поэтому можно пользоваться математическими операциями из этой библиотеки:

In [133]:
import numpy as np

In [134]:
np.sqrt(df.LotArea)

Id
1       91.923882
2       97.979590
3      106.066017
4       97.724101
5      119.415242
          ...    
96      98.818015
97     101.311401
98     104.503588
99     103.077641
100     96.540147
Name: LotArea, Length: 100, dtype: float64

In [135]:
np.log(df.SalePrice)

Id
1      12.247694
2      12.109011
3      12.317167
4      11.849398
5      12.429216
         ...    
96     12.128111
97     12.273731
98     11.458997
99     11.326596
100    11.767180
Name: SalePrice, Length: 100, dtype: float64

### Расчет произвольных функций

Для `Series`:

In [136]:
def price_class(price):
    return 'Low' if price < 200000 else 'High'

In [139]:
df.SalePrice.apply(price_class)

Id
1      High
2       Low
3      High
4       Low
5      High
       ... 
96      Low
97     High
98      Low
99      Low
100     Low
Name: SalePrice, Length: 100, dtype: object

Для `DataFrame`:

In [140]:
def interesting(row):
    if (row['PoolArea'] > 0) and (row['SalePrice'] < 250000):
        return 'Interesting, has pool'
    elif row['SalePrice'] < 200000:
        return 'Interesting, no pool'
    else:
        return 'Too expensive'

In [141]:
df.apply(interesting, axis=1)

Id
1             Too expensive
2      Interesting, no pool
3             Too expensive
4      Interesting, no pool
5             Too expensive
               ...         
96     Interesting, no pool
97            Too expensive
98     Interesting, no pool
99     Interesting, no pool
100    Interesting, no pool
Length: 100, dtype: object

Вопрос: зачем пользоваться покомпонентными операциями и функциями, если можно сделать `apply`?

In [142]:
%%timeit
df.SalePrice**2

1000 loops, best of 3: 187 µs per loop


In [143]:
%%timeit
df.SalePrice.apply(lambda x: x**2)

1000 loops, best of 3: 259 µs per loop


## Добавление/изменение строк/столбцов

### Столбцы

Используем индексацию и правила расчета формул:

In [144]:
df['PricePerArea'] = df.SalePrice / df.LotArea

In [145]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000,18.945212
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765


In [146]:
df['VeryUsefulColumn'] = 0

In [147]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,0
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,0
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,0
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000,18.945212,0
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571,0
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0


In [148]:
df.loc[[1, 3, 4], 'VeryUsefulColumn'] = 1

In [149]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,1
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,1
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,1
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000,18.945212,0
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571,0
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0


In [150]:
df.loc[[1, 3, 4], 'VeryUsefulColumn'] = [2, 4, 6]

In [151]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,2
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,4
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,6
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000,18.945212,0
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571,0
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0


In [156]:
df.loc[[1, 3, 4], 'VeryUsefulColumn'] = df.SalePrice

In [155]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000,18.945212,0
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571,0
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0


### Строки

In [159]:
df.loc[140] = 0

In [160]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,8,2006,WD,Normal,214000,20.849571,0
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950,13.835837,0


In [161]:
df.loc['count'] = df.count()

In [162]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea,VeryUsefulColumn
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250,0
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945,0
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765,0
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950,13.835837,0
140,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0.000000,0


In [163]:
df.index

Index([       1,        2,        3,        4,        5,        6,        7,
              8,        9,       10,
       ...
             93,       94,       95,       96,       97,       98,       99,
            100,      140, u'count'],
      dtype='object', name=u'Id', length=102)

## Удаление строк/столбцов

### Столбцы

In [164]:
df.drop('VeryUsefulColumn', axis=1)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,PricePerArea
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500,24.674556
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500,18.906250
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500,19.866667
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000,14.659686
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000,17.531557
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750,8.675945
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000,7.811765
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950,13.835837
140,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0.000000


In [165]:
df.drop(['PricePerArea', 'VeryUsefulColumn'], axis=1)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,352,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950
140,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0


In [166]:
df.drop(['PricePerArea', 'VeryUsefulColumn'], axis=1, inplace=True)

In [167]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,352,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950
140,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0


### Строки

In [168]:
df.drop(140, axis=0)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,352,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950


In [169]:
df.drop([140, 'count'], axis=0)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


In [170]:
df.drop([140, 'count'], axis=0, inplace=True)

In [171]:
df

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


### Удаление дубликатов

In [172]:
df.drop_duplicates()

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


In [173]:
df.drop_duplicates(subset=['MSSubClass', 'MSZoning'])

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
6,50,RL,85.0,14115,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Mitchel,Norm,Norm,1Fam,...,40,30,0,320,0,0,,MnPrv,Shed,700,10,2009,WD,Normal,143000
9,50,RM,51.0,6120,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Artery,Norm,1Fam,...,90,0,205,0,0,0,,,,0,4,2008,WD,Abnorml,129900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,70,RM,50.0,10300,Pave,,IR1,Bnk,AllPub,Inside,Gtl,OldTown,RRAn,Feedr,1Fam,...,12,11,64,0,0,0,,GdPrv,,0,4,2010,WD,Normal,140000
76,180,RM,21.0,1596,Pave,,Reg,Lvl,AllPub,Inside,Gtl,MeadowV,Norm,Norm,Twnhs,...,120,101,0,0,0,0,,GdWo,,0,11,2009,WD,Normal,91000
89,50,C (all),105.0,8470,Pave,,IR1,Lvl,AllPub,Corner,Gtl,IDOTRR,Feedr,Feedr,1Fam,...,0,0,156,0,0,0,,MnPrv,,0,10,2009,ConLD,Abnorml,85000
93,30,RL,80.0,13360,Pave,Grvl,IR1,HLS,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,...,0,0,44,0,0,0,,,,0,8,2009,WD,Normal,163500


### Удаление строк/столбцов с пропущенными значениями

In [174]:
df.dropna()

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
40,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0


In [175]:
df.dropna(axis=1)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,...,GarageCars,GarageArea,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,8450,Pave,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,...,2,548,Y,0,61,0,0,0,0,0,2,2008,WD,Normal,208500
2,20,RL,9600,Pave,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,...,2,460,Y,298,0,0,0,0,0,0,5,2007,WD,Normal,181500
3,60,RL,11250,Pave,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,...,2,608,Y,0,42,0,0,0,0,0,9,2008,WD,Normal,223500
4,70,RL,9550,Pave,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,2Story,7,...,3,642,Y,0,35,272,0,0,0,0,2,2006,WD,Abnorml,140000
5,60,RL,14260,Pave,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,2Story,8,...,3,836,Y,192,84,0,0,0,0,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,9765,Pave,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,2Story,6,...,2,420,Y,232,63,0,0,0,0,480,4,2009,WD,Normal,185000
97,20,RL,10264,Pave,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,1Story,7,...,2,472,Y,158,29,0,0,0,0,0,8,2006,WD,Normal,214000
98,20,RL,10921,Pave,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,1Story,4,...,1,432,P,120,0,0,0,0,0,0,5,2007,WD,Normal,94750
99,30,RL,10625,Pave,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,1Story,5,...,1,366,Y,0,0,77,0,0,0,400,5,2010,COD,Abnorml,83000


In [176]:
df.dropna(how='all')

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


## Замена значений и переименование

Названия строк и столбцов:

In [177]:
df.rename(columns={'SalePrice' : 'Price', 'YrSold' : 'YearSold'})

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YearSold,SaleType,SaleCondition,Price
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


In [178]:
df.rename(columns=str.upper)

Unnamed: 0_level_0,MSSUBCLASS,MSZONING,LOTFRONTAGE,LOTAREA,STREET,ALLEY,LOTSHAPE,LANDCONTOUR,UTILITIES,LOTCONFIG,LANDSLOPE,NEIGHBORHOOD,CONDITION1,CONDITION2,BLDGTYPE,...,WOODDECKSF,OPENPORCHSF,ENCLOSEDPORCH,3SSNPORCH,SCREENPORCH,POOLAREA,POOLQC,FENCE,MISCFEATURE,MISCVAL,MOSOLD,YRSOLD,SALETYPE,SALECONDITION,SALEPRICE
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000


Значения:

In [179]:
df.CentralAir

Id
1      Y
2      Y
3      Y
4      Y
5      Y
      ..
96     Y
97     Y
98     Y
99     N
100    Y
Name: CentralAir, Length: 100, dtype: object

In [202]:
df.CentralAir

Id
1      Y
2      Y
3      Y
4      Y
5      Y
      ..
96     Y
97     Y
98     Y
99     N
100    Y
Name: CentralAir, Length: 100, dtype: object

In [203]:
central_air = df.replace({'CentralAir' : {'Y' : True, 'N' : False}}).CentralAir

In [201]:
df.replace('Norm', np.nan).Condition1

Id
1        NaN
2      Feedr
3        NaN
4        NaN
5        NaN
       ...  
96       NaN
97       NaN
98       NaN
99       NaN
100      NaN
Name: Condition1, Length: 100, dtype: object

После замены можно привести к правильному типу:

In [217]:
central_air = central_air.astype(bool)
central_air

Id
1       True
2       True
3       True
4       True
5       True
       ...  
96      True
97      True
98      True
99     False
100     True
Name: CentralAir, Length: 100, dtype: bool

Замена пропущенных значений:

In [182]:
df.fillna(-1000)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,-1000,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,-1000.0,-1000,-1000,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,-1000,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,-1000.0,-1000,-1000,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,-1000,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,-1000.0,-1000,-1000,0,9,2008,WD,Normal,223500
4,70,RL,60.0,9550,Pave,-1000,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,-1000.0,-1000,-1000,0,2,2006,WD,Abnorml,140000
5,60,RL,84.0,14260,Pave,-1000,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,-1000.0,-1000,-1000,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,-1000.0,9765,Pave,-1000,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,-1000.0,-1000,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,-1000,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,-1000.0,-1000,-1000,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,-1000,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,-1000.0,-1000,-1000,0,5,2007,WD,Normal,94750
99,30,RL,85.0,10625,Pave,-1000,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,-1000.0,-1000,Shed,400,5,2010,COD,Abnorml,83000


## Фильтрация

### Фильтрация по названиям столбцов

In [183]:
df.filter(like='Garage')

Unnamed: 0_level_0,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Attchd,2003.0,RFn,2,548,TA,TA
2,Attchd,1976.0,RFn,2,460,TA,TA
3,Attchd,2001.0,RFn,2,608,TA,TA
4,Detchd,1998.0,Unf,3,642,TA,TA
5,Attchd,2000.0,RFn,3,836,TA,TA
...,...,...,...,...,...,...,...
96,BuiltIn,1993.0,Fin,2,420,TA,TA
97,Attchd,1999.0,RFn,2,472,TA,TA
98,Attchd,1965.0,Fin,1,432,TA,TA
99,Basment,1920.0,Unf,1,366,Fa,TA


In [184]:
df.filter(regex='Yr|Year')

Unnamed: 0_level_0,YearBuilt,YearRemodAdd,GarageYrBlt,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2003,2003,2003.0,2008
2,1976,1976,1976.0,2007
3,2001,2002,2001.0,2008
4,1915,1970,1998.0,2006
5,2000,2000,2000.0,2008
...,...,...,...,...
96,1993,1993,1993.0,2009
97,1999,1999,1999.0,2006
98,1965,1965,1965.0,2007
99,1920,1950,1920.0,2010


### Фильтрация по содержимому

Используем индексацию и булевы операции:

In [185]:
df.loc[df.SaleCondition == 'Normal']

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
6,50,RL,85.0,14115,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Mitchel,Norm,Norm,1Fam,...,40,30,0,320,0,0,,MnPrv,Shed,700,10,2009,WD,Normal,143000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,60,RL,69.0,9337,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,162,0,0,0,0,,,,0,5,2007,WD,Normal,204750
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750


In [186]:
df.loc[df.SalePrice < 150000, 'Fence']

Id
4        NaN
6      MnPrv
9        NaN
10       NaN
11       NaN
       ...  
92      GdWo
94       NaN
98       NaN
99       NaN
100      NaN
Name: Fence, Length: 48, dtype: object

In [187]:
df.loc[df.SalePrice < 150000, ['PoolArea', 'Fence', 'SalePrice']]

Unnamed: 0_level_0,PoolArea,Fence,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4,0,,140000
6,0,MnPrv,143000
9,0,,129900
10,0,,118000
11,0,,129500
...,...,...,...
92,0,GdWo,98600
94,0,,133900
98,0,,94750
99,0,,83000


## Сортировка

По индексу:

In [190]:
df.sort_index(ascending=False)

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
100,20,RL,77.0,9320,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,352,0,0,0,0,0,,,Shed,400,1,2010,WD,Normal,128950
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
98,20,RL,73.0,10921,Pave,,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,,,,0,5,2007,WD,Normal,94750
97,20,RL,78.0,10264,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,,,,0,8,2006,WD,Normal,214000
96,60,RL,,9765,Pave,,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,,,Shed,480,4,2009,WD,Normal,185000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500


In [None]:
df.sort_index(ascending=False)

In [191]:
df.sort_index(axis=1)

Unnamed: 0_level_0,1stFlrSF,2ndFlrSF,3SsnPorch,Alley,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,BsmtFinType1,BsmtFinType2,BsmtFullBath,BsmtHalfBath,BsmtQual,...,PoolQC,RoofMatl,RoofStyle,SaleCondition,SalePrice,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,856,854,0,,3,1Fam,TA,No,706,0,GLQ,Unf,1,0,Gd,...,,CompShg,Gable,Normal,208500,WD,0,Pave,8,856,AllPub,0,2003,2003,2008
2,1262,0,0,,3,1Fam,TA,Gd,978,0,ALQ,Unf,0,1,Gd,...,,CompShg,Gable,Normal,181500,WD,0,Pave,6,1262,AllPub,298,1976,1976,2007
3,920,866,0,,3,1Fam,TA,Mn,486,0,GLQ,Unf,1,0,Gd,...,,CompShg,Gable,Normal,223500,WD,0,Pave,6,920,AllPub,0,2001,2002,2008
4,961,756,0,,3,1Fam,Gd,No,216,0,ALQ,Unf,1,0,TA,...,,CompShg,Gable,Abnorml,140000,WD,0,Pave,7,756,AllPub,0,1915,1970,2006
5,1145,1053,0,,4,1Fam,TA,Av,655,0,GLQ,Unf,1,0,Gd,...,,CompShg,Gable,Normal,250000,WD,0,Pave,9,1145,AllPub,192,2000,2000,2008
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,680,790,0,,3,1Fam,Gd,No,310,0,ALQ,Unf,0,0,Gd,...,,CompShg,Gable,Normal,185000,WD,0,Pave,6,680,AllPub,232,1993,1993,2009
97,1588,0,0,,3,1Fam,TA,Av,1162,0,ALQ,Unf,0,0,Gd,...,,CompShg,Gable,Normal,214000,WD,0,Pave,6,1588,AllPub,158,1999,1999,2006
98,960,0,0,,3,1Fam,TA,No,520,0,Rec,Unf,1,0,TA,...,,CompShg,Hip,Normal,94750,WD,0,Pave,6,960,AllPub,120,1965,1965,2007
99,835,0,0,,2,1Fam,TA,No,108,0,ALQ,Unf,0,0,TA,...,,CompShg,Gable,Abnorml,83000,COD,0,Pave,5,458,AllPub,0,1920,1950,2010


По значениям:

In [192]:
df.sort_values('SalePrice')

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
40,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0
31,70,C (all),50.0,8500,Pave,Pave,Reg,Lvl,AllPub,Inside,Gtl,IDOTRR,Feedr,Norm,1Fam,...,0,54,172,0,0,0,,MnPrv,,0,7,2008,WD,Normal,40000
30,30,RM,60.0,6324,Pave,,IR1,Lvl,AllPub,Inside,Gtl,BrkSide,Feedr,RRNn,1Fam,...,49,0,87,0,0,0,,,,0,5,2008,WD,Normal,68500
69,30,RM,47.0,4608,Pave,,Reg,Lvl,AllPub,Corner,Gtl,OldTown,Artery,Norm,1Fam,...,0,0,0,0,0,0,,,,0,6,2010,WD,Normal,80000
99,30,RL,85.0,10625,Pave,,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,,,Shed,400,5,2010,COD,Abnorml,83000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46,120,RL,61.0,7658,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NridgHt,Norm,Norm,TwnhsE,...,196,82,0,0,0,0,,,,0,2,2010,WD,Normal,319900
21,60,RL,101.0,14215,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NridgHt,Norm,Norm,1Fam,...,240,154,0,0,0,0,,,,0,11,2006,New,Partial,325300
12,60,RL,85.0,11924,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NridgHt,Norm,Norm,1Fam,...,147,21,0,0,0,0,,,,0,7,2006,New,Partial,345000
54,20,RL,68.0,50271,Pave,,IR1,Low,AllPub,Inside,Gtl,Veenker,Norm,Norm,1Fam,...,857,72,0,0,0,0,,,,0,11,2006,WD,Normal,385000


In [194]:
df.sort_values(['SaleCondition', 'SalePrice']).to_csv('train_sorted.csv')

## Ответы на вопросы

In [205]:
df.fillna(u'Заполнить')

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
1,60,RL,65,8450,Pave,Заполнить,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,61,0,0,0,0,Заполнить,Заполнить,Заполнить,0,2,2008,WD,Normal,208500
2,20,RL,80,9600,Pave,Заполнить,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,...,298,0,0,0,0,0,Заполнить,Заполнить,Заполнить,0,5,2007,WD,Normal,181500
3,60,RL,68,11250,Pave,Заполнить,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,0,42,0,0,0,0,Заполнить,Заполнить,Заполнить,0,9,2008,WD,Normal,223500
4,70,RL,60,9550,Pave,Заполнить,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,Заполнить,Заполнить,Заполнить,0,2,2006,WD,Abnorml,140000
5,60,RL,84,14260,Pave,Заполнить,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,...,192,84,0,0,0,0,Заполнить,Заполнить,Заполнить,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,60,RL,Заполнить,9765,Pave,Заполнить,IR2,Lvl,AllPub,Corner,Gtl,Gilbert,Norm,Norm,1Fam,...,232,63,0,0,0,0,Заполнить,Заполнить,Shed,480,4,2009,WD,Normal,185000
97,20,RL,78,10264,Pave,Заполнить,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,158,29,0,0,0,0,Заполнить,Заполнить,Заполнить,0,8,2006,WD,Normal,214000
98,20,RL,73,10921,Pave,Заполнить,Reg,HLS,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,...,120,0,0,0,0,0,Заполнить,Заполнить,Заполнить,0,5,2007,WD,Normal,94750
99,30,RL,85,10625,Pave,Заполнить,Reg,Lvl,AllPub,Corner,Gtl,Edwards,Norm,Norm,1Fam,...,0,0,77,0,0,0,Заполнить,Заполнить,Shed,400,5,2010,COD,Abnorml,83000


Фильтрация без учета регистра букв делается с помощью регулярного выражения, но его нужно передать не строкой, а в скомпилированном виде с флагом `re.IGNORECASE`:

In [206]:
import re
df.filter(regex=re.compile(r'garage', re.IGNORECASE))

Unnamed: 0_level_0,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Attchd,2003.0,RFn,2,548,TA,TA
2,Attchd,1976.0,RFn,2,460,TA,TA
3,Attchd,2001.0,RFn,2,608,TA,TA
4,Detchd,1998.0,Unf,3,642,TA,TA
5,Attchd,2000.0,RFn,3,836,TA,TA
...,...,...,...,...,...,...,...
96,BuiltIn,1993.0,Fin,2,420,TA,TA
97,Attchd,1999.0,RFn,2,472,TA,TA
98,Attchd,1965.0,Fin,1,432,TA,TA
99,Basment,1920.0,Unf,1,366,Fa,TA


Сортировка одних значений по возрастанию, а других по убыванию делается тем же параметром `ascending`, но ему нужно передать список `True` и `False` для каждого поля, по которому сортируем. В данном случае поле `SaleCondition` по возрастанию, а поле `SalePrice` по убыванию:

In [207]:
df.sort_values(['SaleCondition', 'SalePrice'], ascending=[True, False])

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
40,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0
47,50,RL,48.0,12822,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Mitchel,Norm,Norm,1Fam,...,168,43,0,0,198,0,,,,0,8,2009,WD,Abnorml,239686
57,160,FV,24.0,2645,Pave,Pave,Reg,Lvl,AllPub,Inside,Gtl,Somerst,Norm,Norm,Twnhs,...,115,0,0,0,0,0,,,,0,8,2009,WD,Abnorml,172500
41,20,RL,84.0,8658,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,...,0,138,0,0,0,0,,GdWo,,0,12,2006,WD,Abnorml,160000
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,...,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21,60,RL,101.0,14215,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NridgHt,Norm,Norm,1Fam,...,240,154,0,0,0,0,,,,0,11,2006,New,Partial,325300
14,20,RL,91.0,10652,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,...,160,33,0,0,0,0,,,,0,8,2007,New,Partial,279500
88,160,FV,40.0,3951,Pave,Pave,Reg,Lvl,AllPub,Corner,Gtl,Somerst,Norm,Norm,TwnhsE,...,0,234,0,0,0,0,,,,0,6,2009,New,Partial,164500
61,20,RL,63.0,13072,Pave,,Reg,Lvl,AllPub,Inside,Gtl,SawyerW,RRAe,Norm,1Fam,...,0,50,0,0,0,0,,,,0,5,2006,New,Partial,158000
