# Speeding up with Dask

**Data Science with Raghav**

### Dependencies

In [1]:
import pandas as pd
import dask.dataframe as dd #Dask Multiprocessing 
from textblob import TextBlob

### Generate sample text dataset

In [2]:
data_text = ['Hello World I am good']*100000

In [3]:
df = pd.DataFrame(data_text,columns=['text'])

In [4]:
df.shape

(100000, 1)

### Create custom sentiment extraction function which takes a long time to execute

In [5]:
def get_sentiment(text):
    blob = TextBlob(text)
    return blob.sentiment

### Apply the custom function using normal Pandas apply function

In [6]:
%time new_df = df['text'].apply(get_sentiment)

Wall time: 32.7 s


### Use DASK for parallel processing

In [7]:
ddata = dd.from_pandas(df, npartitions=24)
%time new_df = ddata.map_partitions(lambda df: df.apply((lambda row: get_sentiment(row['text'])), axis=1)).compute(scheduler='processes')  

Wall time: 30.4 s


## Conclusion

**Dask can utilize all the cores on your local machine, parallelize data processing and make your process considerably faster**