# PLEASE CLONE THIS NOTEBOOK INTO YOUR PERSONAL FOLDER
# DO NOT RUN CODE IN THE SHARED FOLDER

# How to Mount Your Team's Cloud Storage

## Download Databricks CLI

**Note:** All Databricks CLI commands shhould be run on your local computer, not in the cluster.

1. Install the Databricks CLI by running this command:
`python3 -m pip install databricks-cli`
2. Go to the top right corner of this UI and click on the box **databricks_poc_clus...**, click on **User Settings**, finally click on **Generate New Token**. You will only have one chance to copy the token to a safe place.
3. Run the following command to configure the **CLI**:
`databricks configure --token`
4. Provide this url when prompted with Databricks Host: `https://adb-731998097721284.4.azuredatabricks.net`
5. Paste the Token when prompted.

## Azure Blob Storage

**Special Note:** Creating a Storage account, only needs to be performed by one member of the team. The token needs to be shared among the rest of the members via a Secrets ACL. Please be responsible.

### Create Storage Account
1. Navigate to https://portal.azure.com
2. Login using Calnet credentials *myuser@berkeley.edu*
3. Click on the top right corner on the User Icon.
4. Click on Switch directory. Make sure you switch to **UC Berkeley berkeley.onmicrosoft.com**, this would be your personal space.
5. Click on the Menu Icon on the top left corner, navigate to **Storage accounts**.
6. Choose the option **Azure for Students** to take advantage of $100 in credits. Provide you *berkeley.edu* email and follow the prompts.
7. Once the subscription is in place, navigate back to Storage accounts, refresh if needed. Hit the button **+ Create** in the top menu.
  - Choose **Azure for Students** as Subscription.
  - Create a new Resource group. Name is irrelevant here.
  - Choose a **Storage account name**, you will need this in the *Init Script* below. (e.g., jshanahan)
  - Go with the defaults for the rest of the form.
  - Hit the **Review + create** button.
8. Once the **Storage account** is shown in your list:
  - Click on it. This will open a sub-window.
  - Under *Data Storage*, click on **container**.
  - Hit the **+ Container** in the top menu.
  - Choose a name for your container, you might need this if you choose a SAS token in the *Init Script* below.
  
**Note:** Create your Blob Storage in the US West Region.

### Obtain Credentials

First, you need to choose between using an Access Key or a SAS tokens. Bottom line, SAS tokens would be recommended since it's a token in which you have control on permissions and TTL (Time to Live). On the other hand, an Access Key, would grant full access to the Storage Account and will generate SAS tokens in the backend when these expire.

To obtain the **Access Key**:
1. Navigate back to *Storage accounts**.
2. Click on the recently created account name.
3. In the sub-window, under *Security + networking*, click on **Access Keys**.
4. Hit the **Show keys** button.
5. Copy the **Key**, you don't need the Connection string. It's irrelevant if you choose *key1* or *key2*.

To obtain a **SAS Token**:
1. Navigate to the containers list.
2. At the far right, click on the `...` for the container you just created.
3. Check the boxes of the permissions you want.
4. Select an expiration you are comfortable with.
5. Hit the **Generate SAS token and URL** button.
6. Scroll down and copy only the **Blob SAS token**.

**Note:** SAS stands for *Shared access signature*.

## Store Credentials as Databricks Secret

**Special Note:** Only the member that created the Storage account should perform this step.

1. Create a scope:
`databricks secrets create-scope --scope <choose-any-name>`
2. Load the key/token:
`databricks secrets put --scope <name-from-above> --key <choose-any-name> --string-value '<paste-token-here>'`
3. Add a principal to the Secret Scope ACL to share token with your teammates. **Careful:** make sure you type the right Team number.
`databricks secrets put-acl --scope <name-from-above> --principal team<your-team-number> --permission READ`

**Note:** This has been tested only on Mac/Linux. It might be different in Windows.

### Init Script

In [0]:
from pyspark.sql.functions import col, max

blob_container = "w261-team28-container" # The name of your container created in https://portal.azure.com
storage_account = "team28" # The name of your Storage account created in https://portal.azure.com
secret_scope = "w261-team28-scope" # The name of the scope created in your local computer using the Databricks CLI
secret_key = "w261-team28-key" # The name of the secret key created in your local computer using the Databricks CLI 
blob_url = f"wasbs://{blob_container}@{storage_account}.blob.core.windows.net"
mount_path = "/mnt/mids-w261"

Run one of the next two cells.

### Access Key

In [0]:
# spark.conf.set(
#   f"fs.azure.account.key.{storage_account}.blob.core.windows.net",
#   dbutils.secrets.get(scope = secret_scope, key = secret_key)
# )

### SAS Token

In [0]:
spark.conf.set(
  f"fs.azure.sas.{blob_container}.{storage_account}.blob.core.windows.net",
  dbutils.secrets.get(scope = secret_scope, key = secret_key)
)

## Test it!
A *Read Only* mount has been made available to all clusters in this Databricks Platform. It contains data you will use for **HW5** and **Final Project**. Feel free to explore the files by running the cell below.

In [0]:
display(dbutils.fs.ls(f"{mount_path}"))

path,name,size
dbfs:/mnt/mids-w261/HW5/,HW5/,0
dbfs:/mnt/mids-w261/datasets_final_project/,datasets_final_project/,0


In [0]:
# Load the Jan 1st, 2015 for Weather
df_weather = spark.read.parquet(f"{mount_path}/datasets_final_project/weather_data/*").filter(col('DATE') < "2015-01-02T00:00:00.000").cache()
display(df_weather)

STATION,DATE,SOURCE,LATITUDE,LONGITUDE,ELEVATION,NAME,REPORT_TYPE,CALL_SIGN,QUALITY_CONTROL,WND,CIG,VIS,TMP,DEW,SLP,AW1,GA1,GA2,GA3,GA4,GE1,GF1,KA1,KA2,MA1,MD1,MW1,MW2,OC1,OD1,OD2,REM,EQD,AW2,AX4,GD1,AW5,GN1,AJ1,AW3,MK1,KA4,GG3,AN1,RH1,AU5,HL1,OB1,AT8,AW7,AZ1,CH1,RH3,GK1,IB1,AX1,CT1,AK1,CN2,OE1,MW5,AO1,KA3,AA3,CR1,CF2,KB2,GM1,AT5,AY2,MW6,MG1,AH6,AU2,GD2,AW4,MF1,AA1,AH2,AH3,OE3,AT6,AL2,AL3,AX5,IB2,AI3,CV3,WA1,GH1,KF1,CU2,CT3,SA1,AU1,KD2,AI5,GO1,GD3,CG3,AI1,AL1,AW6,MW4,AX6,CV1,ME1,KC2,CN1,UA1,GD5,UG2,AT3,AT4,GJ1,MV1,GA5,CT2,CG2,ED1,AE1,CO1,KE1,KB1,AI4,MW3,KG2,AA2,AX2,AY1,RH2,OE2,CU3,MH1,AM1,AU4,GA6,KG1,AU3,AT7,KD1,GL1,IA1,GG2,OD3,UG1,CB1,AI6,CI1,CV2,AZ2,AD1,AH1,WD1,AA4,KC1,IA2,CF3,AI2,AT1,GD4,AX3,AH4,KB3,CU1,CN4,AT2,CG1,CF1,GG1,MV2,CW1,GG4,AB1,AH5,CN3
3809099999,2015-01-01T00:00:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-12,99999,V020,"200,1,N,0077,1","00240,1,C,N",8000199,1131,991,103061,,"01,1,+00180,1,07,1","05,1,+00240,1,07,1","08,1,+00360,1,07,1",,"9,AGL ,+99999,+99999",08991011999001801999999,,,999999102131.0,"3,1,002,1,+999,9",511.0,,,39901441999.0,49901341999.0,SYN10603809 11358 82015 10113 20099 30213 40306 53002 69901 75165 887// 333 81706 85708 88712 90710 91128 91026=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,6000021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,61021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T00:50:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-15,99999,V020,"210,1,N,0077,1","00183,1,C,N",8000199,1101,1001,999999,,"02,1,+00122,1,99,9","04,1,+00183,1,99,9","08,1,+00305,1,99,9",,"9,AGL ,+99999,+99999",99999021999001221999999,,,102901999999.0,,511.0,,,,,MET079METAR EGDR 010050Z 21015KT 8000 -DZ FEW004 SCT006 OVC010 11/10 Q1029 YLO1=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T01:00:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-12,99999,V020,"210,1,N,0077,1","00300,1,9,N",8000199,1131,1011,103001,,"01,1,+00120,1,07,1","03,1,+00180,1,07,1","08,1,+00300,1,07,1",,"9,AGL ,+99999,+99999",08991011999001201999999,,,999999102061.0,"8,1,004,1,+999,9",511.0,,,39901341999.0,,SYN09403809 41258 82115 10113 20101 30206 40300 58004 75155 887// 333 81704 83706 88710 90710 91126=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T01:50:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-15,99999,V020,"200,1,N,0082,1","00244,1,9,N",8000199,1201,1001,999999,,"04,1,+00183,1,99,9","07,1,+00244,1,99,9","08,1,+00305,1,99,9",,"9,AGL ,+99999,+99999",99999041999001831999999,,,102901999999.0,,51.0,,1441.0,,,MET086METAR EGDR 010150Z 20016G28KT 8000 HZ SCT006 BKN008 OVC010 12/10 Q1029 REDZ YLO1=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T02:00:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-12,99999,V020,"200,1,N,0082,1","00240,1,C,N",8000199,1151,1001,102941,,"03,1,+00180,1,07,1","05,1,+00240,1,07,1","08,1,+00300,1,07,1",,"9,AGL ,+99999,+99999",08991031999001801999999,,,999999102011.0,"8,1,008,1,+999,9",201.0,,,39901491999.0,49901441999.0,SYN10003809 41358 82016 10115 20100 30201 40294 58008 72052 886// 333 83706 85708 88710 90710 91129 91028=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T02:50:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-15,99999,V020,"210,1,N,0093,1","00122,1,9,N",6000199,1101,1101,999999,,"02,1,+00061,1,99,9","07,1,+00122,1,99,9","08,1,+00213,1,99,9",,"9,AGL ,+99999,+99999",99999021999000611999999,,,102901999999.0,,511.0,,,,,MET079METAR EGDR 010250Z 21018KT 6000 -DZ FEW002 BKN004 OVC007 11/11 Q1029 YLO2=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T03:00:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-12,99999,V020,"210,1,N,0093,1","00120,1,C,N",6000199,1111,1061,102961,,"01,1,+00060,1,07,1","05,1,+00120,1,07,1","08,1,+00210,1,07,1",,"9,AGL ,+99999,+99999",08991011999000601999999,,,999999102031.0,"5,1,010,1,+999,9",501.0,,,39901441999.0,,SYN09403809 41156 82118 10111 20106 30203 40296 55010 75052 887// 333 81702 85704 88707 90710 91128=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T03:50:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-15,99999,V020,"200,1,N,0082,1","00122,1,9,N",6000199,1101,1101,999999,,"02,1,+00061,1,99,9","07,1,+00122,1,99,9","08,1,+00183,1,99,9",,"9,AGL ,+99999,+99999",99999021999000611999999,,,102801999999.0,,511.0,,1341.0,,,MET082METAR EGDR 010350Z 20016G26KT 6000 -DZ FEW002 BKN004 OVC006 11/11 Q1028 YLO2=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T04:00:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-12,99999,V020,"200,1,N,0082,1","00120,1,C,N",6000199,1131,1071,102901,,"01,1,+00060,1,07,1","05,1,+00120,1,07,1","08,1,+00180,1,07,1",,"9,AGL ,+99999,+99999",08991011999000601999999,,,999999101971.0,"7,1,010,1,+999,9",511.0,,,39901391999.0,49901341999.0,SYN10003809 41156 82016 10113 20107 30197 40290 57010 75152 887// 333 81702 85704 88706 90710 91127 91026=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,51021.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3809099999,2015-01-01T04:50:00.000+0000,4.0,50.086092,-5.255711,81.38,"CULDROSE, UK",FM-15,99999,V020,"200,1,N,0082,1","00122,1,9,N",2500199,1101,1101,999999,,"04,1,+00061,1,99,9","08,1,+00122,1,99,9",,,"9,AGL ,+99999,+99999",99999041999000611999999,,,102801999999.0,,581.0,,1391.0,,,MET076METAR EGDR 010450Z 20016G27KT 2500 -RADZ SCT002 OVC004 11/11 Q1028 AMB=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [0]:
# This command will write to your Cloud Storage if right permissions are in place. 
# Navigate back to your Storage account in https://portal.azure.com, to inspect the files.
df_weather.write.parquet(f"{blob_url}/weather_data_1d")

In [0]:
# Load it the previous DF as a new DF
df_weather_new = spark.read.parquet(f"{blob_url}/weather_data_1d")
display(df_weather_new)

STATION,DATE,SOURCE,LATITUDE,LONGITUDE,ELEVATION,NAME,REPORT_TYPE,CALL_SIGN,QUALITY_CONTROL,WND,CIG,VIS,TMP,DEW,SLP,AW1,GA1,GA2,GA3,GA4,GE1,GF1,KA1,KA2,MA1,MD1,MW1,MW2,OC1,OD1,OD2,REM,EQD,AW2,AX4,GD1,AW5,GN1,AJ1,AW3,MK1,KA4,GG3,AN1,RH1,AU5,HL1,OB1,AT8,AW7,AZ1,CH1,RH3,GK1,IB1,AX1,CT1,AK1,CN2,OE1,MW5,AO1,KA3,AA3,CR1,CF2,KB2,GM1,AT5,AY2,MW6,MG1,AH6,AU2,GD2,AW4,MF1,AA1,AH2,AH3,OE3,AT6,AL2,AL3,AX5,IB2,AI3,CV3,WA1,GH1,KF1,CU2,CT3,SA1,AU1,KD2,AI5,GO1,GD3,CG3,AI1,AL1,AW6,MW4,AX6,CV1,ME1,KC2,CN1,UA1,GD5,UG2,AT3,AT4,GJ1,MV1,GA5,CT2,CG2,ED1,AE1,CO1,KE1,KB1,AI4,MW3,KG2,AA2,AX2,AY1,RH2,OE2,CU3,MH1,AM1,AU4,GA6,KG1,AU3,AT7,KD1,GL1,IA1,GG2,OD3,UG1,CB1,AI6,CI1,CV2,AZ2,AD1,AH1,WD1,AA4,KC1,IA2,CF3,AI2,AT1,GD4,AX3,AH4,KB3,CU1,CN4,AT2,CG1,CF1,GG1,MV2,CW1,GG4,AB1,AH5,CN3
47739099999,2015-01-01T00:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"260,1,N,0098,1","99999,9,9,N",9999199,401,-301,999999,,"02,1,+00610,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100401999999.0,,,,,,,MET069METAR RJNS 010000Z 26019KT 9999 FEW020 04/M03 Q1004 RMK 1CU020 A2967=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T00:03:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-16,99999,V020,"260,1,N,0093,1","99999,9,9,N",9999199,401,-301,999999,,"02,1,+00610,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100401999999.0,,,,1441.0,,,MET072SPECI RJNS 010003Z 26018G28KT 9999 FEW020 04/M03 Q1004 RMK 1CU020 A2967=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T01:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"260,1,N,0108,1","99999,9,9,N",9999199,401,-301,999999,,"02,1,+00610,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100401999999.0,,,,1651.0,,,MET072METAR RJNS 010100Z 26021G32KT 9999 FEW020 04/M03 Q1004 RMK 1CU020 A2967=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T02:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"260,1,N,0118,1","99999,9,9,N",9999199,501,-601,999999,,"02,1,+00610,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100301999999.0,,,,,,,MET069METAR RJNS 010200Z 26023KT 9999 FEW020 05/M06 Q1003 RMK 2CU020 A2964=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T02:02:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-16,99999,V020,"260,1,N,0118,1","99999,9,9,N",9999199,501,-501,999999,,"02,1,+00610,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100301999999.0,,,,1701.0,,,MET072SPECI RJNS 010202Z 26023G33KT 9999 FEW020 05/M05 Q1003 RMK 2CU020 A2964=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T03:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"270,1,N,0108,1","01372,1,C,N",9999199,501,-1001,999999,,"02,1,+00610,1,99,9","04,1,+01372,1,99,9",,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100201999999.0,,,,,,,MET093METAR RJNS 010300Z 27021KT 9999 FEW020 SCT045 05/M10 Q1002 RMK 1CU020 3CU045 A2959 P/FR=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T03:04:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-16,99999,V020,"270,1,N,0103,1","01372,1,C,N",9999199,601,-901,999999,,"02,1,+00610,1,99,9","04,1,+01372,1,99,9",,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100201999999.0,,,,1541.0,,,MET091SPECI RJNS 010304Z 27020G30KT 9999 FEW020 SCT045 06/M09 Q1002 RMK 1CU020 3CU045 A2959=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T04:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"260,1,N,0108,1","01372,1,C,N",9999199,501,-901,999999,,"02,1,+00610,1,99,9","04,1,+01372,1,99,9",,,"9,AGL ,+99999,+99999",99999021999006101999999,,,100101999999.0,,,,1601.0,,,MET091METAR RJNS 010400Z 26021G31KT 9999 FEW020 SCT045 05/M09 Q1001 RMK 1CU020 3CU045 A2957=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T05:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"290,1,V,0103,1","99999,9,9,N",9999199,401,-801,999999,,"02,1,+00914,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999009141999999,,,100101999999.0,,,,1601.0,,,MET085METAR RJNS 010500Z 29020G31KT 250V310 9999 FEW030 04/M08 Q1001 RMK 1CU030 A2957=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
47739099999,2015-01-01T06:00:00.000+0000,4,34.8,138.1833333,135.0,"SHIZUOKA AIRPORT, JA",FM-15,99999,V020,"280,1,N,0124,1","99999,9,9,N",9999199,301,-701,999999,,"02,1,+00914,1,99,9",,,,"9,AGL ,+99999,+99999",99999021999009141999999,,,100201999999.0,,,,1851.0,,,MET072METAR RJNS 010600Z 28024G36KT 9999 FEW030 03/M07 Q1002 RMK 1CU030 A2959=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [0]:
print(f"Your new df_weather has {df_weather_new.count():,} rows.")
print(f'Max date: {df_weather_new.select([max("DATE")]).collect()[0]["max(DATE)"].strftime("%Y-%m-%d %H:%M:%S")}')

In [0]:
display(dbutils.fs.ls(f"{mount_path}/HW5"))

path,name,size
dbfs:/mnt/mids-w261/HW5/all-pages-indexed-in.txt,all-pages-indexed-in.txt,2143300687
dbfs:/mnt/mids-w261/HW5/all-pages-indexed-out.txt,all-pages-indexed-out.txt,2090459616
dbfs:/mnt/mids-w261/HW5/indices.txt,indices.txt,517438296
dbfs:/mnt/mids-w261/HW5/test_graph.txt,test_graph.txt,167


## Using RDD API

When reading/writing using the RDD API, configuration cannot happen at runtime but at cluster creation.
Ping Luis Villarreal with the following information to be added in your Cluster as Spark Configuration:
- Storage Account name
- Container name
- Secret Scope name
- Secret Key name

**Important:** Do not share the actual SAS token.

After this is added as Spark Configuration, try the scripts provided below to test the Hadoop plug-in to connect to your Azure Blob Storage.
```
spark.hadoop.fs.azure.sas.{container_name}.{storage_account}.blob.core.windows.net {{secrets/{scope}/{key}}}
```

In [0]:
rdd = sc.textFile('/mnt/mids-w261/HW5/test_graph.txt')

parsed_rdd = rdd.map(lambda line: tuple(line.split('\t')))
parsed_rdd.take(3)

In [0]:
parsed_rdd.saveAsTextFile(f"{blob_url}/graph_test")