#### Common warnings:

1. __Backup your solution into the 'work' directory inside the home directory ('/home/jovyan'). It is the only one that state will be saved over sessions.__

1. Please, ensure that you call the right interpreter (python2 or python3). Do not write just "python" without the major version. There is no guarantee that any particular version of Python is set as the default one in the Grading system.

1. One cell must contain only one programming language.
E.g. if a cell contains Python code and you also want to call a bash-command (using “!”) in it, you should move the bash to another cell.

1. Our IPython converter is an improved version of the standard converter Nbconvert and it can process most of Jupyter's magic commands correctly (e.g. it understands "%%bash" and executes the cell as a "bash"-script). However, we highly recommend to avoid magics wherever possible.

#### Spark specific warnings:

1. It is a good practice to run Spark with master "yarn". However, containered system's performance is limited. If you see repeating Py4JavaErrors or Py4JNetworkErrors exceptions which you assume are not relevant to your code, feel free to change master to “local”.

1. You should eliminate extra symbols in output (such as quotes, brackets etc.). When you finally get the resulting dataframe it is easier to print wiki.take(1) instead of traverse RDD using for cycle. But in this case a lot of junk symbols will be printed like: `[['Anarchism', 'is', .. ]]`. See the right output example in the task.

#### Task hint
Each subsequent of these tasks is a continuation of the previous one. So, you may use the same IPython notebook for all the programming assignments in this week.

In [92]:
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
from pyspark.sql import Window

spark_session = SparkSession.builder.enableHiveSupport().master("yarn").getOrCreate()

In [93]:
data = spark_session.read.parquet("/data/sample264")
meta = spark_session.read.parquet("/data/meta")

In [94]:
data.show(10)

+------+-------+--------+----------+
|userId|trackId|artistId| timestamp|
+------+-------+--------+----------+
| 13065| 944906|  978428|1501588527|
|101897| 799685|  989262|1501555608|
|215049| 871513|  988199|1501604269|
|309769| 857670|  987809|1501540265|
|397833| 903510|  994595|1501597615|
|501769| 818149|  994975|1501577955|
|601353| 958990|  973098|1501602467|
|710921| 916226|  972031|1501611582|
|  6743| 801006|  994339|1501584964|
|152407| 913509|  994334|1501571055|
+------+-------+--------+----------+
only showing top 10 rows



In [95]:
meta.show(10)

+------+--------------------+--------------------+-------+
|  type|                Name|              Artist|     Id|
+------+--------------------+--------------------+-------+
| track|               Smile| Artist: Josh Groban|1223851|
| track|Chuni Ashkharhe Q...|Artist: Razmik Amyan|1215486|
| track|           Dark City|Artist: Machinae ...|1296462|
| track|       Not Sensitive|        Artist: Moby|1249694|
|artist|Artist: Carlos Pu...|Artist: Carlos Pu...|1352221|
| track|Thiz Gangsta Chit...|Artist: Tha Dogg ...|1217194|
| track|            Ruffneck|    Artist: Skrillex|1245681|
| track|              Incerc|       Artist: Spike|1193283|
|artist|Artist: Wallenber...|Artist: Wallenber...|1333444|
| track|               remix|    Artist: Flo Rida|1246378|
+------+--------------------+--------------------+-------+
only showing top 10 rows



In [96]:
# For the user with Id 776748 find all the artists
artists = (data.join(meta, data.artistId == meta.Id)
               .select('Artist', 'Name')
               .where((f.col('userId') == 776748) & (f.col('type') == 'artist'))
               .distinct()
              )

In [97]:
artists.show(10)

+--------------------+--------------------+
|              Artist|                Name|
+--------------------+--------------------+
|       Artist: Lordi|       Artist: Lordi|
|Artist: Rise Against|Artist: Rise Against|
|    Artist: Slipknot|    Artist: Slipknot|
|   Artist: Green Day|   Artist: Green Day|
|Artist: 3 Doors Down|Artist: 3 Doors Down|
|Artist: Three Day...|Artist: Three Day...|
|  Artist: Nickelback|  Artist: Nickelback|
|        Artist: Nomy|        Artist: Nomy|
|Artist: Serj Tankian|Artist: Serj Tankian|
|Artist: Thousand ...|Artist: Thousand ...|
+--------------------+--------------------+
only showing top 10 rows



In [98]:
# For the user with Id 776748 find all the tracks
tracks = (data.join(meta, data.trackId == meta.Id)
               .select('Artist', 'Name')
               .where((f.col('userId') == 776748) & (f.col('type') == 'track'))
               .distinct()
              )

In [99]:
tracks.show(10)

+--------------------+--------------------+
|              Artist|                Name|
+--------------------+--------------------+
|  Artist: Clawfinger|    Nothing Going On|
|    Artist: Gotthard|               Eagle|
| Artist: Linkin Park|          In The End|
|   Artist: Green Day|             21 Guns|
|       Artist: Lordi|Hard Rock Hallelujah|
|  Artist: Papa Roach|Getting Away With...|
|    Artist: Slipknot|      Wait And Bleed|
|        Artist: Korn|        Here To Stay|
|Artist: 3 Doors Down|          Kryptonite|
|        Artist: Nomy|             Cocaine|
+--------------------+--------------------+
only showing top 10 rows



In [100]:
#Sort founded items first by artist then by name in ascending order
#leave only columns ”Artist” and “Name” and print top-40.

tracks_artist = (artists.union(tracks)
                 .orderBy('Artist', 'Name')
                 .select('Artist', 'Name')
                 .take(40)
                )

In [101]:
for val in tracks_artist:
    print("%s %s" % val)

Artist: 3 Doors Down Artist: 3 Doors Down
Artist: 3 Doors Down Kryptonite
Artist: 311 Artist: 311
Artist: 311 Beautiful disaster
Artist: Blur Artist: Blur
Artist: Blur Girls and Boys
Artist: Clawfinger Artist: Clawfinger
Artist: Clawfinger Nothing Going On
Artist: Disturbed Artist: Disturbed
Artist: Disturbed The Vengeful One
Artist: Gotthard Artist: Gotthard
Artist: Gotthard Eagle
Artist: Green Day 21 Guns
Artist: Green Day Artist: Green Day
Artist: Green Day Kill The DJ
Artist: Iggy Pop Artist: Iggy Pop
Artist: Iggy Pop Sunday
Artist: Korn Artist: Korn
Artist: Korn Here To Stay
Artist: Linkin Park Artist: Linkin Park
Artist: Linkin Park In The End
Artist: Linkin Park Numb
Artist: Lordi Artist: Lordi
Artist: Lordi Hard Rock Hallelujah
Artist: Nickelback Artist: Nickelback
Artist: Nickelback She Keeps Me Up
Artist: Nomy Artist: Nomy
Artist: Nomy Cocaine
Artist: Papa Roach Artist: Papa Roach
Artist: Papa Roach Getting Away With Murder
Artist: Rise Against Artist: Rise Against
Artist: Ri