# Pandaral·lel

Pandaral.lel es una extensión no oficial de Pandas que aporta una forma sencilla de paralelizar las operaciones de pandas.

| Pandas  | ![Without Pandarallel](https://github.com/nalepae/pandarallel/blob/master/docs/progress_apply.gif?raw=true)       |
| :----------------------: | ----------------------------------------------------------------------------------------------------------------- |
| **Con paralelización** | ![With Pandarallel](https://github.com/nalepae/pandarallel/blob/master/docs/progress_parallel_apply.gif?raw=true) |

Pandaral·lel está disponible en: https://github.com/nalepae/pandarallel

# Análisis de sentimientos

In [1]:
import pandas as pd
from pandarallel import pandarallel
import numpy as np
import re
from textblob import TextBlob

## Cargamos el dataset de un CSV

In [2]:
from IPython.display import display
# Download the dataset from https://www.kaggle.com/kazanova/sentiment140
df = pd.read_csv('./tweets.csv', header=None, sep=',', 
                 names=['target', 'id', 'date', 'flag', 'user', 'text'], encoding='latin-1')
display(df)

Unnamed: 0,target,id,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."
...,...,...,...,...,...,...
1599995,4,2193601966,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,AmandaMarie1028,Just woke up. Having no school is the best fee...
1599996,4,2193601969,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,TheWDBoards,TheWDB.com - Very cool to hear old Walt interv...
1599997,4,2193601991,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,bpbabe,Are you ready for your MoJo Makeover? Ask me f...
1599998,4,2193602064,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,tinydiamondz,Happy 38th Birthday to my boo of alll time!!! ...


## Procesado de los tweets

In [3]:
def preprocess_tweet(text):
    text = re.sub('@[A-Za-z\d]+', '', text)  # Removing @mentions
    text = re.sub('#[A-Za-z\d]+', '', text)  # Removing hashtag
    text = re.sub('https?:\/\/\S+', '', text)  # Removing hyperlink
    # Removing emojis
    text = re.sub(
        '[\u2600-\u26FF\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F1E0-\U0001F1FF]', '',
        text)
    return text

In [4]:
def discretize_sentiment(text):
    text = preprocess_tweet(text)
    polarity = TextBlob(text).sentiment.polarity
    if polarity < 0:
        return 'negative'
    if polarity == 0:
        return 'neutral'
    if polarity > 0:
        return 'positive'

## Realicemos el procesado en local

In [6]:
pandarallel.initialize(use_memory_fs=False)

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


In [7]:
serie = df['text']

# 1 minuto aprox ...
polarity = serie.parallel_apply(discretize_sentiment)

Process ForkPoolWorker-4:
Process ForkPoolWorker-7:
Process ForkPoolWorker-5:
Process ForkPoolWorker-3:
Process ForkPoolWorker-9:
Process ForkPoolWorker-2:
Process ForkPoolWorker-6:
Process ForkPoolWorker-8:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/mult

KeyboardInterrupt: 

  File "pandas/_libs/lib.pyx", line 2919, in pandas._libs.lib.map_infer
  File "/home/ubuntu/.local/lib/python3.10/site-packages/textblob/decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/apply.py", line 1100, in apply
    return self.apply_standard()
  File "/tmp/ipykernel_799/1737721765.py", line 3, in discretize_sentiment
    polarity = TextBlob(text).sentiment.polarity
  File "/home/ubuntu/.local/lib/python3.10/site-packages/textblob/decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/tmp/ipykernel_799/1737721765.py", line 3, in discretize_sentiment
    polarity = TextBlob(text).sentiment.polarity
  File "/tmp/ipykernel_799/1737721765.py", line 3, in discretize_sentiment
    polarity = TextBlob(text).sentiment.polarity
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/apply.py", line 1151, in a

## Ahora con la ayuda del cloud...

In [5]:
pandarallel.initialize(use_memory_fs=False)

INFO: Pandarallel will run on 100 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


In [6]:
serie = df['text']

polarity = serie.parallel_apply(discretize_sentiment)

2022-10-01 16:12:52,083 [INFO] lithops.config -- Lithops v2.6.0
2022-10-01 16:12:52,187 [INFO] lithops.storage.backends.aws_s3.aws_s3 -- S3 client created - Region: eu-central-1
2022-10-01 16:12:52,650 [INFO] lithops.serverless.backends.aws_lambda.aws_lambda -- AWS Lambda client created - Region: eu-central-1
2022-10-01 16:12:52,652 [INFO] lithops.invokers -- ExecutorID 2b249a-0 | JobID M000 - Selected Runtime: gfinol/python3.10:3.0 - 1769MB
2022-10-01 16:13:03,663 [INFO] lithops.invokers -- ExecutorID 2b249a-0 | JobID M000 - Starting function invocation: cloud_process_wrapper() - Total: 100 activations
2022-10-01 16:13:03,721 [INFO] lithops.invokers -- ExecutorID 2b249a-0 | JobID M000 - View execution logs at /tmp/lithops/logs/2b249a-0-M000.log
2022-10-01 16:13:20,621 [INFO] lithops.wait -- ExecutorID 2b249a-0 - Getting results from 100 function activations


    0%|          | 0/100  

2022-10-01 16:13:28,654 [INFO] lithops.executors -- ExecutorID 2b249a-0 - Cleaning temporary data


In [7]:
polarity.name = "polarity"
display(polarity)

0          positive
1           neutral
2          positive
3          positive
4          negative
             ...   
1599995    positive
1599996    positive
1599997    positive
1599998    positive
1599999    positive
Name: polarity, Length: 1600000, dtype: object

In [8]:
df.assign(polarity = polarity)

Unnamed: 0,target,id,date,flag,user,text,polarity
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t...",positive
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...,neutral
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...,positive
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,positive
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all....",negative
...,...,...,...,...,...,...,...
1599995,4,2193601966,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,AmandaMarie1028,Just woke up. Having no school is the best fee...,positive
1599996,4,2193601969,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,TheWDBoards,TheWDB.com - Very cool to hear old Walt interv...,positive
1599997,4,2193601991,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,bpbabe,Are you ready for your MoJo Makeover? Ask me f...,positive
1599998,4,2193602064,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,tinydiamondz,Happy 38th Birthday to my boo of alll time!!! ...,positive
