## Data Wrangling in PySpark
Complete the tasks below using PySpark. You can export the Spark DataFrame to Pandas when **necessary**.

### Task I
Connect to Spark cluster by launching SparkSession

### Download DATA

You can find the stock price data [here](https://drive.google.com/file/d/19z6AKWpKOQLpOiiLZ_QoprsPtIcOipNa/view?usp=sharing)

## Import Modules

In [3]:
from pyspark.sql import functions as f

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

### Load Data

In [4]:
from pyspark.sql.types import *

data_schema = [
               StructField('_c0', IntegerType(), True),
               StructField('symbol', StringType(), True),
               StructField('data', DateType(), True),
               StructField('open', DoubleType(), True),
               StructField('high', DoubleType(), True),
               StructField('low', DoubleType(), True),
               StructField('close', DoubleType(), True),
               StructField('volume', IntegerType(), True),
               StructField('adjusted', DoubleType(), True),
               StructField('market.cap', StringType(), True),
               StructField('sector', StringType(), True),
               StructField('industry', StringType(), True),
               StructField('exchange', StringType(), True),
            ]

final_struc = StructType(fields=data_schema)

In [5]:
data = spark.read.csv(
    'stocks_price_final.csv',
    sep = ',',
    header = True,
    schema = final_struc
    )

In [6]:
data.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- symbol: string (nullable = true)
 |-- data: date (nullable = true)
 |-- open: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- close: double (nullable = true)
 |-- volume: integer (nullable = true)
 |-- adjusted: double (nullable = true)
 |-- market.cap: string (nullable = true)
 |-- sector: string (nullable = true)
 |-- industry: string (nullable = true)
 |-- exchange: string (nullable = true)



### Task II
How many distinct symbols do we have for each exchange?

### Task III
What is the most expensice stock on NYSE and NASDAQ respectively (use the latest day available)?

### Task IV
Compute the average opening and closing price per sector and convert it into Pandas DataFrame

### Task V
Compute the mean and median opening and closing price per industry and convert it into Pandas DataFrame

### Task VI
How many companies are there in the sector 'Health Care'.

### Task VII
Plot the average adjusted price of **Technology** sector stock over the time