# Introduction

As studied before, there are a lot of data types in column. Let's explore it!

The type of a column in a Serie or DataFrame is called **dtype**. To visualize the dtype, we can use the `dtype` property

In [1]:
import pandas as pd
reviews = pd.read_csv('../input/wine-reviews/winemag-data-130k-v2.csv', index_col=0)
pd.set_option('max_rows', 5)

In [2]:
reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


In [3]:
reviews.price.dtype

dtype('float64')

We can see the dtype of all columns

In [4]:
reviews.dtypes

country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object

The indexes also have dtype $\downarrow$

In [5]:
reviews.index.dtype

dtype('int64')

The `dtype` tell us the type and how pandas is storing the data internally. For example, `int64` means we have a column filled with 64-bits integers!

Realize that there's `object` dtype. It means a columns filled of strings

### But if we want to convert dtypes?

It's possible to do it using the `astype()` method. This method take the new dtype as argument and converts the dtype, if it's possible

In [6]:
reviews.points.astype('float64')

0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

<p style='color:red'>Obs:</p>

The `astype` method doesn't change the data!

In [7]:
reviews.points

0         87
1         87
          ..
129969    90
129970    90
Name: points, Length: 129971, dtype: int64

## Dealing with missing data

Missing values are represented as `NaN` (Not a Number). To select the `NaN` valuse, we can use the `pd.isnull()` 

In [8]:
reviews[pd.isnull(reviews.country)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
913,,"Amber in color, this wine has aromas of peach ...",Asureti Valley,87,30.0,,,,Mike DeSimone,@worldwineguys,Gotsa Family Wines 2014 Asureti Valley Chinuri,Chinuri,Gotsa Family Wines
3131,,"Soft, fruity and juicy, this is a pleasant, si...",Partager,83,,,,,Roger Voss,@vossroger,Barton & Guestier NV Partager Red,Red Blend,Barton & Guestier
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129590,,"A blend of 60% Syrah, 30% Cabernet Sauvignon a...",Shah,90,30.0,,,,Mike DeSimone,@worldwineguys,Büyülübağ 2012 Shah Red,Red Blend,Büyülübağ
129900,,This wine offers a delightful bouquet of black...,,91,32.0,,,,Mike DeSimone,@worldwineguys,Psagot 2014 Merlot,Merlot,Psagot


To replace the values, there's a method for this. Just use the `fillna()`

In [9]:
reviews.region_2.fillna('Unkown')

0         Unkown
1         Unkown
           ...  
129969    Unkown
129970    Unkown
Name: region_2, Length: 129971, dtype: object

In [10]:
reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object