# Amazon food reviews Sentiment Analysis using PySpark

### Entry notes :

- Spark has three different data structures available through its APIs: RDD, Dataframe (this is different from Pandas data frame), Dataset.
- Dataframe is much faster than RDD because it has metadata (some information about data) associated with it, which allows Spark to optimize query plan.
-  SparkMLLib is used with RDD, while SparkML supports Dataframe.

In [1]:
import pyspark as ps
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
import warnings

warnings.filterwarnings('ignore')

In [2]:
# local[*] : Run Spark locally with as many worker threads as logical cores on your machine.
conf = SparkConf().setMaster("local[*]").setAppName("Sentiment_Analysis")
sc = SparkContext(conf = conf)
sqlcontext = SQLContext(sc)

In [3]:
sc

#### I will use the cleaned data from my Amazon Food Review project, that you can find in the same repository

In [5]:
path = "./Data/final_cleaned.csv"
data = sqlcontext.read.format('com.databricks.spark.csv').\
                 options(header='true',inferschema='true').load(path)
data.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- Id: integer (nullable = true)
 |-- ProductId: string (nullable = true)
 |-- UserId: string (nullable = true)
 |-- ProfileName: string (nullable = true)
 |-- HelpfulnessNumerator: string (nullable = true)
 |-- HelpfulnessDenominator: string (nullable = true)
 |-- Score: string (nullable = true)
 |-- Time: string (nullable = true)
 |-- Summary: string (nullable = true)
 |-- Text: string (nullable = true)
 |-- sentiment: string (nullable = true)



In [10]:
data.show(5,truncate=True)

+---+---+----------+--------------+--------------------+--------------------+----------------------+-----+----------+--------------------+--------------------+---------+
|_c0| Id| ProductId|        UserId|         ProfileName|HelpfulnessNumerator|HelpfulnessDenominator|Score|      Time|             Summary|                Text|sentiment|
+---+---+----------+--------------+--------------------+--------------------+----------------------+-----+----------+--------------------+--------------------+---------+
|  0|  1|B001E4KFG0|A3SGXH7AUHU8GW|          delmartian|                   1|                     1|    5|1303862400|Good Quality Dog ...|bought sever vita...|  postive|
|  1|  2|B00813GRG4|A1D87F6ZCVE5NK|              dll pa|                   0|                     0|    1|1346976000|   Not as Advertised|product arriv lab...| negative|
|  2|  3|B000LQOCH0| ABXLMWJIXXAIN|"Natalia Corres "...|                   1|                     1|    4|1219017600|"""Delight"" says...|confect arou