# Working with lists in Scala

Dataset (English Premier League stats 2019-2020) taken from [here](https://www.kaggle.com/idoyo92/epl-stats-20192020)

In [1]:
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};

Intitializing Scala interpreter ...

Spark Web UI available at http://DESKTOP-M0P9HNP:4040
SparkContext available as 'sc' (version = 3.0.0-preview2, master = local[*], app id = local-1590269988426)
SparkSession available as 'spark'


import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}


In [2]:
val df = spark.read.format("csv").option("header", "true").load("epl2020.csv")

df: org.apache.spark.sql.DataFrame = [_c0: string, h_a: string ... 43 more fields]


In [19]:
df.show(2)

+---+---+--------+--------+--------+--------+----+------------+------+------+------+------+-------------------+----+-----+-----+---+---------+---------+----------------+----------------+---------+----------+-----+--------+-------+---------+----+-----+----+----+----+----+----+-----+----+----+----+----+-------+-------+-------+-----------------+-----------------+--------+
|_c0|h_a|      xG|     xGA|    npxG|   npxGA|deep|deep_allowed|scored|missed|  xpts|result|               date|wins|draws|loses|pts|    npxGD|   teamId|        ppda_cal|    allowed_ppda|matchtime|tot_points|round|tot_goal|tot_con|Referee.x|HS.x|HST.x|HF.x|HC.x|HY.x|HR.x|AS.x|AST.x|AF.x|AC.x|AY.x|AR.x|B365H.x|B365D.x|B365A.x|         HtrgPerc|         AtrgPerc|matchDay|
+---+---+--------+--------+--------+--------+----+------------+------+------+------+------+-------------------+----+-----+-----+---+---------+---------+----------------+----------------+---------+----------+-----+--------+-------+---------+----+-----+----+

In [3]:
val listTeams = df.select("teamId").rdd.map(r => r(0).asInstanceOf[java.lang.String]).collect().toList

listTeams: List[String] = List(Liverpool, Norwich, Man City, West Ham, Bournemouth, Brighton, Burnley, Crystal Palace, Everton, Sheffield United, Southampton, Watford, Aston Villa, Tottenham, Arsenal, Leicester, Newcastle United, Wolves, Chelsea, Man Utd, Arsenal, Burnley, Aston Villa, Bournemouth, Brighton, Everton, Liverpool, Newcastle United, Norwich, Southampton, Watford, West Ham, Man City, Tottenham, Crystal Palace, Sheffield United, Chelsea, Leicester, Man Utd, Wolves, Aston Villa, Everton, Chelsea, Norwich, Brighton, Crystal Palace, Leicester, Man Utd, Sheffield United, Southampton, Watford, West Ham, Arsenal, Liverpool, Bournemouth, Man City, Burnley, Newcastle United, Tottenham, Wolves, Man Utd, Southampton, Aston Villa, Bournemouth, Brighton, Chelsea, Crystal Palace, Leiceste...


In [4]:
"Man Utd" :: listTeams.take(3)

res0: List[String] = List(Man Utd, Liverpool, Norwich, Man City)


### Yield

In [None]:
for (team <- listTeams) yield team.toUpperCase

In [12]:
for { i <- 1 to 2 ; j <- 1 to 2 } println(s"i = $i, j = $j")

i = 1, j = 1
i = 1, j = 2
i = 2, j = 1
i = 2, j = 2


In [17]:
val MTeams = for (team <- listTeams if team.startsWith("M")) yield team 

MTeams: List[String] = List(Man City, Man Utd, Man City, Man Utd, Man Utd, Man City, Man Utd, Man City, Man Utd, Man City, Man City, Man Utd, Man City, Man Utd, Man City, Man Utd, Man City, Man Utd, Man City, Man Utd, Man Utd, Man City, Man Utd, Man City, Man City, Man Utd, Man City, Man Utd, Man City, Man Utd, Man City, Man Utd, Man Utd, Man City, Man City, Man Utd, Man Utd, Man City, Man Utd, Man City, Man City, Man Utd, Man Utd, Man City, Man City, Man Utd, Man City, Man Utd, Man Utd, Man City, Man Utd, Man City, Man City, Man Utd, Man Utd, Man City, Man Utd)


### Fold

*The primary difference is the order in which the fold operation iterates through the collection in question. foldLeft starts on the left side—the first item—and iterates to the right; foldRight starts on the right side—the last item—and iterates to the left. fold goes in no particular order.*

[Link to article](https://coderwall.com/p/4l73-a/scala-fold-foldleft-and-foldright)

In [35]:
val SumScored = df.select("scored").withColumn("Scoreds", df.col("scored").cast(IntegerType))

SumScored: org.apache.spark.sql.DataFrame = [scored: string, Scoreds: int]


In [36]:
SumScored.printSchema()

root
 |-- scored: string (nullable = true)
 |-- Scoreds: integer (nullable = true)



In [62]:
val SumScoredList = SumScored.select("Scoreds").rdd.map(r => r(0).asInstanceOf[Integer]).collect().toList

SumScoredList: List[Integer] = List(4, 1, 5, 0, 1, 3, 3, 0, 0, 1, 0, 0, 1, 3, 1, 0, 0, 0, 0, 4, 2, 1, 1, 2, 1, 1, 2, 1, 3, 1, 0, 1, 2, 2, 0, 1, 1, 1, 1, 1, 2, 0, 3, 2, 0, 2, 2, 1, 1, 2, 1, 3, 1, 3, 1, 3, 1, 1, 0, 1, 1, 1, 0, 1, 0, 2, 1, 3, 4, 1, 0, 2, 1, 2, 0, 3, 3, 2, 2, 2, 3, 1, 1, 1, 5, 0, 0, 1, 0, 1, 4, 2, 2, 3, 3, 1, 2, 2, 0, 0, 3, 1, 2, 1, 2, 0, 8, 0, 2, 0, 0, 0, 1, 0, 2, 1, 3, 2, 1, 2, 1, 0, 2, 2, 0, 2, 2, 2, 0, 1, 2, 0, 2, 2, 1, 3, 5, 0, 1, 1, 3, 0, 5, 1, 0, 1, 2, 1, 0, 0, 2, 1, 1, 0, 4, 0, 1, 2, 0, 1, 2, 0, 2, 0, 1, 1, 1, 2, 0, 0, 1, 1, 1, 1, 0, 2, 1, 1, 0, 1, 9, 0, 0, 3, 0, 3, 2, 1, 0, 1, 2, 4, 1, 1, 2, 2, 2, 3, 1, 1, 1, 0, 1, 1, 2, 0, 2, 2, 3, 0, 3, 1, 2, 1, 2, 1, 0, 2, 1, 1, 0, 2, 2, 0, 1, 3, 2, 2, 1, 1, 1, 0, 0, 2, 1, 1, 3, 2, 3, 1, 3, 2, 2, 1, 0, 3, 1, 0, 2, 2, 2, 2, 0, 2,...


In [59]:
val sumRight = SumScoredList.foldRight(0)(_ + _)

sumRight: Int = 784


In [58]:
val sumLeft = SumScoredList.foldLeft(0)(_ + _)

sumLeft: Int = 784
