# Learning Objectives

In this notebook, you will craft sophisticated ETL jobs that interface with a variety of common data sources, such as 
- REST APIs (HTTP endpoints)
- RDBMS
- Hive tables (managed tables)
- Various file formats (csv, json, parquet, etc.)


# Interview Questions

As you progress through the practice, attempt to answer the following questions:

## Columnar File
- What is a columnar file format and what advantages does it offer?
- Why is Parquet frequently used with Spark and how does it function?
- How do you read/write data from/to a Parquet file using a DataFrame?

## Partitions
- How do you save data to a file system by partitions? (Hint: Provide the code)
- How and why can partitions reduce query execution time? (Hint: Give an example)

## JDBC and RDBMS
- How do you load data from an RDBMS into Spark? (Hint: Discuss the steps and JDBC)

## REST API and HTTP Requests
- How can Spark be used to fetch data from a REST API? (Hint: Discuss making API requests)

## ETL Job One: Parquet file
### Extract
Extract data from the managed tables (e.g. `bookings_csv`, `members_csv`, and `facilities_csv`)

### Transform
Data transformation requirements https://pgexercises.com/questions/aggregates/fachoursbymonth.html

### Load
Load data into a parquet file

### What is Parquet? 

Columnar files are an important technique for optimizing Spark queries. Additionally, they are often tested in interviews.
- https://www.youtube.com/watch?v=KLFadWdomyI
- https://www.databricks.com/glossary/what-is-parquet

In [0]:
# Write your solution here
from pyspark.sql.functions import sum

bookingsDF = spark.sql("SELECT * FROM bookings")
bookingsDF = bookingsDF.filter((bookingsDF["starttime"] >= '2012-09-01 00:00:00') & (bookingsDF["starttime"] < '2012-10-01 00:00:00'))
bookingsDF = bookingsDF.groupBy("facid").agg(sum("slots").alias("Total Slots"))
bookingsDF = bookingsDF.orderBy("Total Slots")

bookingsDF.write.mode("overwrite").parquet("bookings.parquet")
bookingsParquet = spark.read.parquet("/bookings.parquet")

bookingsParquet.show()



+-----+-----------+
|facid|Total Slots|
+-----+-----------+
|    5|        122|
|    3|        422|
|    7|        426|
|    8|        471|
|    6|        540|
|    2|        570|
|    1|        588|
|    0|        591|
|    4|        648|
+-----+-----------+



## ETL Job Two: Partitions

### Extract
Extract data from the managed tables (e.g. `bookings_csv`, `members_csv`, and `facilities_csv`)

### Transform
Transform the data https://pgexercises.com/questions/joins/threejoin.html

### Load
Partition the result data by facility column and then save to `threejoin_delta` managed table. Additionally, they are often tested in interviews.

hint: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriter.partitionBy.html

What are paritions? 

Partitions are an important technique to optimize Spark queries
- https://www.youtube.com/watch?v=hvF7tY2-L3U&t=268s

In [0]:
# Write your solution here
from pyspark.sql.functions import concat_ws

members = spark.sql("SELECT * FROM members")
bookings = spark.sql("SELECT * FROM bookings")
facilities = spark.sql("SELECT * FROM facilities")

dataframe = (
    members
    .join(bookings, "memid", "inner")
    .join(facilities, "facid", "inner")
    .filter(facilities["name"].like("Tennis Court%"))
    .withColumn("member", concat_ws(" ", members["firstname"], members["surname"]))
    .select("member", "name").distinct().orderBy(["member", "name"], ascending = True)
)

spark.sql("DROP TABLE IF EXISTS threejoin_delta")
dbutils.fs.rm("dbfs:/user/hive/warehouse/threejoin_delta", recurse=True)

dataframe.write.partitionBy("name").mode("overwrite").format("parquet").saveAsTable("threejoin_delta")

#display(spark.read.parquet("/user/hive/warehouse/threejoin_delta/name=Tennis Court 1"))
display(spark.sql("SELECT * FROM threejoin_delta WHERE name = 'Tennis Court 1'"))
#display(spark.read.table("threejoin_delta").filter("name = 'Tennis Court 1'"))


member,name
Anne Baker,Tennis Court 1
Burton Tracy,Tennis Court 1
Charles Owen,Tennis Court 1
David Farrell,Tennis Court 1
David Jones,Tennis Court 1
David Pinker,Tennis Court 1
Douglas Jones,Tennis Court 1
Erica Crumpet,Tennis Court 1
Florence Bader,Tennis Court 1
GUEST GUEST,Tennis Court 1


## ETL Job Three: HTTP Requests

### Extract
Extract daily stock price data price from the following companies, Google, Apple, Microsoft, and Tesla. 

Data Source
- API: https://rapidapi.com/alphavantage/api/alpha-vantage
- Endpoint: GET `TIME_SERIES_DAILY`

Sample HTTP request

```
curl --request GET \
	--url 'https://alpha-vantage.p.rapidapi.com/query?function=TIME_SERIES_DAILY&symbol=TSLA&outputsize=compact&datatype=json' \
	--header 'X-RapidAPI-Host: alpha-vantage.p.rapidapi.com' \
	--header 'X-RapidAPI-Key: [YOUR_KEY]'

```

Sample Python HTTP request

```
import requests

url = "https://alpha-vantage.p.rapidapi.com/query"

querystring = {
    "function":"TIME_SERIES_DAILY",
    "symbol":"IBM",
    "datatype":"json",
    "outputsize":"compact"
}

headers = {
    "X-RapidAPI-Host": "alpha-vantage.p.rapidapi.com",
    "X-RapidAPI-Key": "[YOUR_KEY]"
}

response = requests.get(url, headers=headers, params=querystring)

data = response.json()

# Now 'data' contains the daily time series data for "IBM"
```

### Transform
Find **weekly** max closing price for each company.

hints: 
  - Use a `for-loop` to get stock data for each company
  - Use the spark `union` operation to concat all data into one DF
  - create a new `week` column from the data column
  - use `group by` to calcualte max closing price

### Load
- Partition `DF` by company
- Load the DF in to a managed table called, `max_closing_price_weekly`

In [0]:
# Write your solution here
from pyspark.sql.types import StructType, StructField, StringType, DateType, FloatType, json
from pyspark.sql.functions import weekofyear, max
import requests, time
from datetime import datetime

url = "https://alpha-vantage.p.rapidapi.com/query"

tickers = ["GOOG", "AAPL", "MSFT", "TSLA"]

schema = StructType([
    StructField("company", StringType(), True),
    StructField("date", DateType(), True),
    StructField("close", FloatType(), True)
])

final_df = spark.createDataFrame([], schema=schema)

headers = {
    "X-RapidAPI-Host": "alpha-vantage.p.rapidapi.com",
    "X-RapidAPI-Key": "cc5c24148dmsh6d6d9644eb647ccp183ebbjsn3f9c5aa36327"
}

for ticker in tickers:
    querystring = {
    "function":"TIME_SERIES_DAILY",
    "symbol":ticker,
    "datatype":"json",
    "outputsize":"compact"
    }
    response = requests.get(url, headers=headers, params=querystring)
    data = response.json().get("Time Series (Daily)", {})

    stock_data = [(ticker, datetime.strptime(date, "%Y-%m-%d").date(), float(values["4. close"])) for date, values in data.items()]
    
    stock_df = spark.createDataFrame(stock_data, schema = schema)

    final_df = final_df.union(stock_df)
    time.sleep(1)

final_df = ( 
    final_df
    .withColumn("week", weekofyear("date"))
    .groupBy("company", "week").agg(max("close").alias("weekly max"))
    .orderBy("company", "week")
)

spark.sql("DROP TABLE IF EXISTS max_closing_price_weekly")
dbutils.fs.rm("dbfs:/user/hive/warehouse/max_closing_price_weekly", recurse=True)
final_df.write.partitionBy("company").mode("overwrite").format("parquet").saveAsTable("max_closing_price_weekly")

display(spark.sql("SELECT * FROM max_closing_price_weekly"))



week,weekly max,company
1,193.13,GOOG
2,197.96,GOOG
3,197.55,GOOG
4,201.9,GOOG
5,205.6,GOOG
6,207.71,GOOG
7,188.2,GOOG
8,187.13,GOOG
9,181.19,GOOG
10,175.75,GOOG


## ETL Job Four: RDBMS


### Extract
Extract RNA data from a public PostgreSQL database.

- https://rnacentral.org/help/public-database
- Extract 100 RNA records from the `rna` table (hint: use `limit` in your sql)
- hint: use `spark.read.jdbc` https://docs.databricks.com/external-data/jdbc.html

### Transform
We want to load the data as it so there is no transformation required.


### Load
Load the DF in to a managed table called, `rna_100_records`

In [0]:
# Write your solution here

jdbc_url = "jdbc:postgresql://hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs"
rna_query = "SELECT * FROM rna LIMIT 100"
rna_table = (spark.read
             .format("jdbc")
             .option("url", jdbc_url)
             .option("query", rna_query)
             .option("user", "reader")
             .option("password", "NWDMCE5xdipIjRrp")
             .option("driver", "org.postgresql.Driver") 
             .load()
             )

spark.sql("DROP TABLE IF EXISTS rna_100_records")

rna_table.write.mode("overwrite").saveAsTable("rna_100_records")

display(spark.sql("SELECT * FROm rna_100_records"))



id,upi,timestamp,userstamp,crc64,len,seq_short,seq_long,md5
8988357,URS00008926C5,2015-10-20T18:04:07.000+0000,RNACEN,F9626977AB4E17FB,1336,TCAGCGGCGAACGGGTGAGTAACACGTGGGTGACTTGCCCCGAAGATGGGGATAACCTCTGGAAACGGGGGCTAATACCCAATGTGCTCGGTGATTCGGTTCATCGAGTAAAGCTCCGGCGCTTCGGGAGAGGCCTGCGGCCCATCAGCTAGTTGGTAGGGTAACGGCCTACCAAGGCAGAGGCGGGTAGGGGGCGTGAGAGCGCGGACCCCCACACTGGCACTGAGATACGGGCCAGACTCCTACGGGAGGCAGCAGTAAGGGATATTGCGCAATGGACGAAAGTCAGACGCAGCGACGCCGCGTGGGCGATGAAGGCCTTCGGGTTGTAAAGCCCTTTTATGGGGGAAGAGAAAAAGGACGGTACCCCAGGAATAAGTCCCGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGGGACAAGCGTTATCCGGATTCACTGGGCGTAAAGAGCGTTGAGGCGGTTCCGTAAGTTGGGCGTGAAAGCTCCGGGCTTAACTCGGAGATGTCGTTCAATACTGCGGGGCTTGAGGACAGCAGAGGAAGGTGGAATTCCCGGTGTAGTGGTGAAATGCGTAGATATCGGGAGGAACACCCGTGGCGAAGGCGGCCTTCTGGGCTGTTCCTGACGCTGAAGGCGAAAGCTAGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCTAGCTGTAAACGATGGATGCTGGGTGTGGGGGGTGTAAATTCCCTCTGTGCCGAAGCAAACGCGTTAAGCATCCCGCCTGGGGACTACGGCCGCAAGGCTAAAACTCAAACGAATTGACGGGGGCCCGCACAAGCAGCGGAGCGTGTGGTTTAATTCGATGCTACACGAAGAACCTTACCTGGGTTTGACATGCACGTGGTAGGGAACCGAAAGGGGACCGACCTTCGGGAGCGTGCACAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTCGGGTTAAGTCCCGTAACGAGCGCAACCCTTGCCCTTAGTTACAAGTGTCTAAGGGGACTGCCCGGGACAACTGGGAGGAAGGTGGGGATGACGTCAAGTCAGCATGGCCTTTATATCCAGGGCTACACACACGCTACAATGGCCGGTACAATAGGTTGCGAAGTCGTGAGGCGGAGCCAATCCTCAAAGCCGGTCTCAGTTCGAATTGCAGTCTGCAACTCGACTGCATGAAGCTGGAGTTGCTAGTAATCGCAGGTCAGCTATACTGCGGTGATACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCATGGAAGCTGGCAACGCCTGAAGCCGGTGAGCTAACCCGAAAGGGAGGCAGCCGTCGAGGG,,fe4792a9218a34fdee33c9c52c548cf7
8988360,URS00008926C8,2015-10-20T18:04:07.000+0000,RNACEN,DEA611A8ABDE9078,1307,ACTGCTATCGGATTGATACTAAGCCATGCGAGTCATTGTAGCAATACAAGGCATACGGCTCAGTAACGCGTAGTCAACCTAACCTATGGACGGGAATAACCTCGGGAAACTGAGAATAATGCCCGATAGAACATTATGCCTGGAATGGTTTATGTTCCAAATGATTTATCGCCGTAGGATGGGACTGCGGCCTATCAGTTTGTTGGTGAGGTAATGGCCCACCAAGACTATTACAGGTACGGGCTCTGAGAGGAGTAGCCCGGAGATGGGTACTGAGACACGGACCCAGGCCCTATGGGGCGCAGCAGGCGAGAAAACTTTGCAATGTGCGAAAGCACGACAAGGTTAATCCGAGTGATTTGTGCTAAACGAATCTTTTGTTAGTCCTAGAAACACTAACGAATAAGGGGTGGGCAAGTTCTGGTGTCAGCCGCCGCGGTAAAACCAGCACCTCAAGTGGTCAGGATGATTATTGGGCCTAAAGCATCCGTAGCCGGCCCTGTAAGTTTTCGGTTAAATCTGTACGCTTAACGTACAGGCTGCCGGGAATACTGCAGAGCTAGGGAGTGGGAGAAGTAGACGGTACTCGGTAGGAAGTGGTAAAATGCTTTGATCTATCGATGACCACCTGTGGCGAAGGCGGTCTACTAGAACACGTCCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACTATGCAAACTCAGTGATGCATTGGCTTGTGGCCAATGCAGTGCTGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGAAGTACGTACGCAAGTATGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCAGAAATCTTACCCGGAGAGACAGCAGAATGAAGGTCAAGCTGAAGACTTTACCAGACAAGCTGAGAGGTGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGATGTCCTGCTAAGTCAGGTAACGAGCGAGATCCCTGCCTCTAGTTGCCACCATTACTCTCAGGAGTAGTGGGGCGAATTAGCGGGACCGCCGCAGTTAATGCGGAGGAAGGAAGGGGCCACGGCAGGTCAGTATGCCCCGAAACTCTGGGGCCACACGCGGGCTGCAATGGTAACGACAATTGGTTTCGAATCCGAAAGGATGAGGTAATCCTCAAACGTTACCACAGTTATGACTGAGGGCTGCAACTCGCCCTCACGAATATGGAATCCCTAGTAACTGCGTGTCATTATCGCGCGGTGAATACGTCCCTGCTCCTT,,5eb946fc85a2e16f40b2de67dbff627b
8988361,URS00008926C9,2015-10-20T18:04:07.000+0000,RNACEN,AE161A21AF6713C0,1367,AGCCCAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTTGACTCTGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGAACGTCCACCGCATGGTGGGTGTTGGAAAGATTTATCGGTCATGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGTAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGTCGTGAAAGTCCGGGGCTTAACCCCGGATCTGCGGTGGGTACGGGCAGACTAGAGTGCAGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCTGTAACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCACTAGGTGTGGGGACCATTCCACGGTTTCCGCGCCGCAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATGTTCTCGATCGCCGTAGAGATACGGTTTCCCCTTTGGGGCGGGATCACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCCATGTTGCCAGCACGTAATGGTGGGGACTCATGGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGACGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAATGGGTTGCGATACTGTGAGGTGGAGCTAATCCCAAAAAGCCGGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAANTCACGAAAGTCGGTAACACCCGAAGCCGGTGGCCTAACCCTTGTGGGGG,,fe4849b1977b5be3c2a06183c24788c2
8988362,URS00008926CA,2015-10-20T18:04:07.000+0000,RNACEN,03DF15DE82E78D7F,1398,GAGTTTGATCATGGCTCAGGATGAACGCTGGCTACAGGCTTAACACATGCAAGTCGAGGGGAAACGGCATTTGGTGCTTGCACCGAATGGACGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTTCCCGTTACTGCGGGATAACCTGCCGAAAGGCAGACTAATACCGCATGTTCTTCGATGACGGCATCAGATTCGAAGCAAAGATCCGTCGGTAACGGAGGGGGATGCGTCTGATTAGCTAGTTGGCGGGGCGACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACTGAGACACGGTCCAAACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGGAAGCCTGAACCAGCCAAGTAGCGTGCAGGATGACGGCCCTACGGGTTGTAAACTGCTTTTATGCGGGGATAAAGTGAGGGACGCGTCCCTTTTTGCAGGTACCGCATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCCGGGGATTAAGTGTGTTGTGAAATGTAGGCGCCCAACGTCTGACTTGCAGCGCATACTGGTTCCCTTGAGTACGCGCAACGCCGGCGGAATTCGTCGTGTAGCGGTGAAATGCTTAGATATGACGAAGAACCCCGATTGCGAAGGCAGCCGGCGGGAGCGCAACTGACGCTGAAGCTCGAAGGTGCGGGTATCGAACAGGATTAGATACCCTGGTAGTCCGCACGGTAAACGATGGATGCCCGCTGTCGGCGCCTTGCGCCGGCGGCCAAGCGAAAGCGTTAAGCATCCCACCTGGGGAGTACGCCGGCAACGGTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGAGGAACATGTGGTTTAATTCGATGATACGCGAGGAACCTTACCCGGGCTTGAATCGCAGGAGAACGAAACAGAGATGTTGAGGTCCTTCGGGACTCCTGCGAAGGTGCTGCATGGTCGTCGTCAGCTCGTGCCGTGAGGTGTCGGCTTAAGTGCCATAACGAGCGCAACCCCTCTCCCCAGTTGCCATCGGGTGATGCCGGGCACTCCGGGGACACTGCCGCCGCAAGGTGCGAGGAAGGCGGGGATGACGTCAAATCAGCACGGCCCTTACGTCCGGGGCTACACACGTGTTACAATGGCCGGCACAGAGTGTCGGTGCGGCGCGAGCCGCATCTAATCTTGAAAACCGGTCTCAGTTCGGACTGGGGTCTGCAACCCGACCCCACGAAGCTGGATTCGCTAGTAATCGCGCATCAGCCACGGCGCGGTGAATACGTTCCCGGGCCTTGCACACACCGCCCGTCA,,c4bb7b410de36a58cfe8a7f49c170fd5
8988364,URS00008926CC,2015-10-20T18:04:07.000+0000,RNACEN,AE0439B061E1640E,1409,GTCGAACGGTAACAGGAAGCAGCTTGCTGCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTGGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCAGATGTGCCCAGATGGGATTAGCTTGTTGGTGAGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGTGTTGTGGTTAATAACCGCAGCAATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTCTGTCAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCGAAACTGGCAGGCTTGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTTCCAGAGATGGACTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGATTAGGTCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCAC,,5ebae3845f9c331ae71786ed8c0fa100
11797345,URS0000B40361,2017-10-13T16:48:29.334+0000,rnacen,7DEEC8E7492E0C07,426,GTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGCAGGATGACGGCCCTATGGGTTGTAAACTGCTTTTATGCGGGGATAAAGTGCGCGACGTGTCGTGCATTGCAGGTACCGCATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCCGCCAGGTAAGCGTGTTGTGAAATGTACCGGCTCAACCGGTGAATTGCAGCGCGAACTGTCTGGCTTGAGTGCACGGTAAGCAGGCGGAATTCATGGTGTAGCGGTGAAATGCTTAGATATCATGAAGAACTCCGATTGCGAAGGCAGCTTGCTGCAGTGCGACTGACGCTGATGCTCGAAGGTGCGGGTATCAAACAGGA,,1a56bf4ac54397a3fe81b23e3f14edc1
8987869,URS00008924DD,2015-10-20T18:04:07.000+0000,RNACEN,501C63AF85F4994B,1378,TGATCCTGGCGCAGGATGAACGGTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGACTTTTGTGCTTGCACAAAATGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTTAACTTCGGGATAAGCCTGGGAAACCGGGTCTAATACCGGATACGACGGATCACCGCATGGCGGTCCGTGGTAAGCTTGATGCGGTTTTGGATGGACTCGCGGCCTATCAGCTAGTTGGTTGGGGTAATGGCCCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGGGACGAAGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAGAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAGCCCGGGGCTCAACCCCGGGTCTGCAGTGGGTACGGGCAGACTAGAGTGCAGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCTGTAACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCACTAGGTGTGGGGGACATTCCACGTTTTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATGAACCGGAAAGGCCTGGAAACAGGTCCCCCACTTGTGGCCGGTTTACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCGGGTTATGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGACGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGTTGCGATACTGTGAGGTGGAGCTAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTCATGAAGTTGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCA,,c449606151afed7968c2941f1fd7de5c
8987870,URS00008924DE,2015-10-20T18:04:07.000+0000,RNACEN,E3CAEB752E19EF2A,1370,TTAGAGTTTGATCCTGGCTCAGGACGACCTCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAGGAATATTTTTCGAAATATTCTTAGTGGCGGACGGGTGAGTAACGCGTGAACAATCTGCCCTGTACAAAGGAATACCCTCGGGAAACCGGGATTAAAACCGTATGATACTTTGATGCCGCATGGCAATGAAGTCAAATATTTATAGGTATGGGATGAGTTCGCGTCTGATTAGCTGGTTGGTGAGGTAAAGGCCCACCAAGGCGACGATCAGTAGCCGGCCTGAGAGGGTGAACGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGAGGGAACTCTGATGCAGCGACGCCGCGTGAGTGAAGAAGGTCTTCGGATCGTAAAACTCTGTCCTTGGTGAAGAAAAGGACGGTAGCCAAGGAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCGAGCGTTGTCCGGATTTACTGGGCGTAAAGGGTGAGTAGGCGGTAATATATGTCAGGTGTAAAAGATCATGGCTTAACCATGGTTAGCACTTGAAACTGTATGACTTGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGCTGAGTCACGAAAGCGTGGGTAGCAAAGAGGATTAGATACCCTGGTAGTCCACGCGTAACCCATAGGGGTAGGTGTTGGGTAGCGATATTCAGTGCCGTAGTAAACACAATAAGCACTCCGCCTGGGGAGTACGTACGCAAGTATAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCAGCGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAGGCTTGACATCCCCTTGACAGATGCAGAGATGTGTCCTCTCCTTCGGGAGCAAGGGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTAGTTGCCATCAAGTTAAGTTGGGCACTCTAAGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTACACACGTGCTACAATGGTCGGTACAACGGGGAGCGAAGGAGCGATCCCAAGCAAATCCCAATAAACCGATCCCAGTTCGGATTGCAGGCTGCAACTCGCCTGCATGAAGTCGGAGTTGCTAGTAATCGCGAATCAGAACGTCGCGGTGAATGCGTTCCCGGGTCTTGTACACACCGCCCGTCA,,5e6050447aa0077452539fde5398aec8
8988018,URS0000892572,2015-10-20T18:04:07.000+0000,RNACEN,9FEF66572DBF356C,1400,CAGGATGAACGCTAGCGGGAGGCCTAATACATGCAAGTCGAGCGGTAGAACTAGCTTCGGTTGGTTTGAGAGCGGCGCACGGGTGAGTAACGCGTACGTAACCTGCCCTTCAGTGGGGAATAGCCCCGGGAAACTGGGATTAATGCCCCATGGTACTTTCGATCTGCCTGGATTGAAAGTTAAAGCTTCGGCGCTGAAGGATGGACGTGCGTCTGATTAGCTGGTTGGTGAGGTAACGGCTCACCAAGGCGACGATCAGTAGGGGGCGTGAGAGCGTGACCCCCCACACGGGTACTGAGACACGGACCCGACTCCTACGGGAGGCAGCAGTAAGGAATATTGGGCAATGGAGGCAACTCTGACCCAGCCATCCCGCGTGCAGGAAGACGGCCCTACGGGTTGTAAACTGCTTTTATGAAGGAAGAAAGGTTGGCATTTATGCTGATTTGACGGTACTTCAGGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTATCCGGAATCACTGGGTTTAAAGGGTGCGTAGGCGGCCTGATAAGTCAGAGGTGAAAGTCTGCGGCTTAACCGTAGAATTGCCTTTGATACTGTTGGGCTTGAGTCAGGTTGAGGTTGGCGGAATGTGACATGTAGCGGTGAAATGCATAGATATGTCATAGAACACCGATTGCGAAGGCAGCTGACTGGACCTGAACTGACGCTGAGGCACGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCTCACTCGATATGCGATCCGTAGGATTGCGTGTCCAAGCGAAAGCGTTAAGTGAGCCACCTGGGGAGTACGTCGGCAACGATGAAACTCAAAGGAATTGACGGGGGTCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGATACGCGAGGAACCTTACCTGGGCTAGAATGCGAGTGACCGGCCCTGAAAGGGGCTTTTCCTTCGGGACACAAAGCAAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCCAGCACTTCGGGTGGGGACTCTAAGGAGACTGCCGGCGCAAGCCGCGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCTTTATGCCCAGGGCTACACACGTGCTACAATGGCCAGTACAGAGGGTAGCGAAGCCGCGAGGTGAAGCCAATCCCAGAAAGCTGGTCCCAGTTCGGATTGGAGTCTGGAACTCGACTCCATGAAGGTGGAATCGCTAGTAATCGCGCATCAGCCATGGTGCGGTGAATACGTTCCCGGACCTTGTACACACCGCCCGTCAAACCATGGGAGCCGGGGGTGCCTGAAG,,5e7ee4bb62cfcdd77a94fd8d93b88f42
8988488,URS0000892748,2015-10-20T18:04:07.000+0000,RNACEN,FA606F4B0ABA4DC4,1478,GAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGCTAACCTTCGGGTTAGAGTGGCGAACGGGTGAGTAACACGTAGGTAACCTGCCCATAAGACGAGGATAACTACTGGAAACGGTAGCTAATACTGGATAGTATATAGAATCGCATGATTTTATATTTAAAGATGCGTTTGCATCACTTATGGATGGACCTGCGGCGCATTAGCTAGTTGGTGAGATAACGGCCCACCAAGGCGACGATGCGTAGCCGGACTGAGAGGTCGAACGGCCACATTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATTTTCGGCAATGGACGCAAGTCTGACCGAGCAACGCCGCGTGAGTGATGAAGTTCTTCGGAACGTAAAATTCTTTTATTTGGAAAAAACGTATAGTGTAGGAAATGACATTATAGTGATGGTACCAAATGAATAAGCCCCGGCTAACTATGTGCCAGCAGCCGCGGTAATACATAGGGGGCGAGCGTTATCCGGATTTATTGGGCGTAAAGGGTGCGTAGGCGGTAGATTAAGTCTAAGGTTAAAGTGCAGGGCTCAACCCTGTGATGCCTTAGAAACTGGTTTACTTGAGTTTGGTAGAGGTAAGTGGAACTCCATGTGTAGCGGTAAAATGCGTAAATATATGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGGCCACAACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGAATACTAAGTGTTGGAAAAATCCAGTGCTGAAGTTAACGCATTAAGTATTCCGCCTGAGTAGTACGTACGCAAGTATGAAACTCAAAGGAATTGACGGGACCCCGCACAAGCGGTGGAGCATGTTGTTTAATTCGAAGATACGCGAAGAACCTTACCAGGTCTTGACATCCGTTGCAAAGCTATAGAGATATAGTGGAGGTTAACAGCGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTGCTAGTTACCATCATTTAGTTGGGGACTCTAGCGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACAAACGTGCTACAATGGCCACTACAATGAGAGCCGATACCGCGAGGTGGAGGAAAACTGATAAAAGTGGTCTCAGTTCGGATTGAACTCTGCAACTCGAGTTCATGAAGTTGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCTCGGGGTTTGTACACACCGCCCGTCAAGCCATGGGAGTTTGCAATACCCAAAGCCGGTGGCCTAACCGCAAGGAGGGAGCCGTCTAAGGTAGGGCAAATGACTGGGGTTAAGTCGTAACAAGGTA,,b7bcd6b9fbd22a1e8d343a0120bbbf80
