# Abstract
The resources that Kaggle allocates to execute code are limited. For example 13 or 16 GB of RAM (depending on the account). The TPS-October data is more than 2 GB and is not easy to work with despite Google restrictions. In this article we will see how to make better use of the amount of dedicated RAM.

Due to the high capabilities of pandas, most data scientists are accustomed to using pandas. This is a good choice but not when working with big data .
Another issue is not observing the data type.  In the following, we will deal with these two points

In [None]:
import time
import pandas as pd
import datatable as dt
import psutil
import gc

Pandas load data slowly. One alternative is to use `datatable` and convert theme to `Pandas`. [Link](https://www.kaggle.com/bextuychiev/how-to-work-w-million-row-datasets-like-a-pro#Read-in-the-massive-dataset)

In [None]:
%%time
train_pd = pd.read_csv('/kaggle/input/tabular-playground-series-oct-2021/train.csv')


In [None]:
%%time
train_dt =dt.fread('/kaggle/input/tabular-playground-series-oct-2021/train.csv').to_pandas()

We can delete the columns that we do not need while reading the data and save time and resources 

In [None]:
del train_dt
gc.collect()

In [None]:
%%time
train_dt =dt.fread('/kaggle/input/tabular-playground-series-oct-2021/train.csv', columns=lambda cols:[col.name not in ('id') for col in cols]).to_pandas()

In [None]:
memory_pd_1 = train_pd.memory_usage(index = True).sum() / 1e9
memory_dt_1 = train_dt.memory_usage(index = True).sum() / 1e9

In [None]:
print('Memory consumed through Pandas: {:0.2f} GB'.format(memory_pd_1))
print('Memory consumed through DataTable: {:0.2f} GB'.format(memory_dt_1))

The reason for this difference is in the type of data 

In [None]:
print(f'{3*"="} For Pandas {10*"="}\n{(train_pd.dtypes).value_counts()}')
print(f'\n{3*"="} For Datatable {7*"="}\n{(train_dt.dtypes).value_counts()}')

# Converting Type

Does converting data change their values? **Depends on the data type**

Is it possible to do this easily using the existing functions? **No**

In the following, we will deal with these two points



In [None]:
## clear memory
print('{:0.2f} gb memory used so we empty some of it'.format(psutil.virtual_memory()[3] // 1e9))
del train_dt
gc.collect()
print('clear memory')
print('{:0.2f} gb memory used'.format(psutil.virtual_memory()[3] // 1e9))

In [None]:
# Make a copy for comparison 
train_pd_copy = train_pd.copy()

<div class="alert alert-danger">
<svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 16 16" fill="currentColor">
  <path d="M8.982 1.566a1.13 1.13 0 0 0-1.96 0L.165 13.233c-.457.778.091 1.767.98 1.767h13.713c.889 0 1.438-.99.98-1.767L8.982 1.566zM8 5c.535 0 .954.462.9.995l-.35 3.507a.552.552 0 0 1-1.1 0L7.1 5.995A.905.905 0 0 1 8 5zm.002 6a1 1 0 1 1 0 2 1 1 0 0 1 0-2z"/>
</svg>
<b style="font-size: x-large;">ATTENTION</b><br>
    Running this line will overflow the memory. For this reason, I could not compare the existing functions with the manual method 
    
</div>

In [None]:
# Running this line will overflow the memory 
# train_pd[train_pd.select_dtypes(include='float64') ] = train_pd.select_dtypes(include='float64').astype('float32')

In [None]:
def convert_type(col, new_type):
    return train_pd[col].astype(new_type)

In [None]:
for c in train_pd.columns:
    if train_pd[c].dtypes == 'float64':
        train_pd[c] = convert_type(c, 'float32')
    elif train_pd[c].dtypes == 'int64':
        train_pd[c] = convert_type(c, 'bool')

In [None]:
memory_pd_2 = train_pd.memory_usage(index = True).sum() / 1e9
memory_pd_percent = (memory_pd_2 * 100 ) / memory_pd_1
print('Memory consumed befor converting was {:0.2f} GB and after converting is {:0.2f} GB so {:0.2f}% savings'.format(memory_pd_1, memory_pd_2, memory_pd_percent))

**I will publish the comparison section soon **