Skip to content

Big Data Essentials: HDFS, MapReduce and Spark RDD - COURSERA

Notifications You must be signed in to change notification settings

saumyatiwari/bigEssencial-YADEX

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOCKERHUB: https://hub.docker.com/u/bigdatateam/

Para las practicas de WEEK 3 usar:

bigdatateam/yarn-notebook . https://hub.docker.com/r/bigdatateam/yarn-notebook/

WEEK 4

I install pyspark with python, after install I get this error:

asusn56@nautilus:~$ pyspark Could not find valid SPARK_HOME while searching ['/home', '/usr/local/bin'] /usr/local/bin/pyspark: línea 24: /bin/load-spark-env.sh: No existe el archivo o el directorio /usr/local/bin/pyspark: línea 77: /bin/spark-submit: No existe el archivo o el directorio

for solution use this command:

asusn56@nautilus:~$ PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.6/dist-packages/pyspark pyspark

where

SPARK_HOME is the location where you have install pyspark. For get this path, you typing:

pip show pypspark and read te info about the path

after that use this path in the command line. :)

Other solution is you set enviroment variable SPARK_HOME permanent en your system ..

WEEK 5 -

TLC Trip Record Data FULL DATASET: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

WEEK 6

TELCO dataset: https://dandelion.eu/datagems/SpazioDati/telecom-sms-call-internet-mi/description/

About

Big Data Essentials: HDFS, MapReduce and Spark RDD - COURSERA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.2%
  • Python 3.8%