<a href="https://colab.research.google.com/github/roitraining/SparkProgram/blob/Day2/Day2/Ch03_DataFrames.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Set up the Spark environment.

In [0]:
import sys
sys.path.append('/home/student/ROI/SparkProgram')
from initspark import *
sc, spark, conf = initspark()

Turn a simple RDD into a DataFrame. 

In [0]:
x = sc.parallelize([(1,'alpha'),(2,'beta')])
x0 = spark.createDataFrame(x)
x0.show()

+---+-----+
| _1|   _2|
+---+-----+
|  1|alpha|
|  2| beta|
+---+-----+



Give the DataFrame meaningful column names.

In [0]:
x1 = spark.createDataFrame(x, schema=['ID','Name'])
x1.show()
print(x1)

+---+-----+
| ID| Name|
+---+-----+
|  1|alpha|
|  2| beta|
+---+-----+

DataFrame[ID: bigint, Name: string]


Give a DataFrame a schema with column names and data types.

In [0]:
x2 = spark.createDataFrame(x, 'ID:int, Name:string')
x2.show()
print(x2)

+---+-----+
| ID| Name|
+---+-----+
|  1|alpha|
|  2| beta|
+---+-----+

DataFrame[ID: int, Name: string]


Load a text file into a RDD and clean it up as before.

In [0]:
filename = '/home/student/ROI/SparkProgram/datasets/finance/CreditCard.csv'
cc = sc.textFile(filename)
first = cc.first()
cc = cc.filter(lambda x : x != first)
cc.take(10)


['"Delhi, India",29-Oct-14,Gold,Bills,F,82475',
 '"Greater Mumbai, India",22-Aug-14,Platinum,Bills,F,32555',
 '"Bengaluru, India",27-Aug-14,Silver,Bills,F,101738',
 '"Greater Mumbai, India",12-Apr-14,Signature,Bills,F,123424',
 '"Bengaluru, India",5-May-15,Gold,Bills,F,171574',
 '"Delhi, India",8-Sep-14,Silver,Bills,F,100036',
 '"Delhi, India",24-Feb-15,Gold,Bills,F,143250',
 '"Greater Mumbai, India",26-Jun-14,Platinum,Bills,F,150980',
 '"Delhi, India",28-Mar-14,Silver,Bills,F,192247',
 '"Delhi, India",1-Sep-14,Platinum,Bills,F,67932']

In [0]:
import datetime
cc = cc.map(lambda x : x.split(',')) 
cc.take(10)

[['"Delhi', ' India"', '29-Oct-14', 'Gold', 'Bills', 'F', '82475'],
 ['"Greater Mumbai',
  ' India"',
  '22-Aug-14',
  'Platinum',
  'Bills',
  'F',
  '32555'],
 ['"Bengaluru', ' India"', '27-Aug-14', 'Silver', 'Bills', 'F', '101738'],
 ['"Greater Mumbai',
  ' India"',
  '12-Apr-14',
  'Signature',
  'Bills',
  'F',
  '123424'],
 ['"Bengaluru', ' India"', '5-May-15', 'Gold', 'Bills', 'F', '171574'],
 ['"Delhi', ' India"', '8-Sep-14', 'Silver', 'Bills', 'F', '100036'],
 ['"Delhi', ' India"', '24-Feb-15', 'Gold', 'Bills', 'F', '143250'],
 ['"Greater Mumbai',
  ' India"',
  '26-Jun-14',
  'Platinum',
  'Bills',
  'F',
  '150980'],
 ['"Delhi', ' India"', '28-Mar-14', 'Silver', 'Bills', 'F', '192247'],
 ['"Delhi', ' India"', '1-Sep-14', 'Platinum', 'Bills', 'F', '67932']]

In [0]:
cc = cc.map(lambda x : (x[0][1:], x[1][1:-1], datetime.datetime.strptime(x[2], '%d-%b-%y').date(), x[3], x[4], x[5], float(x[6])))
print (cc.collect())

[('Delhi', 'India', datetime.date(2014, 10, 29), 'Gold', 'Bills', 'F', 82475.0), ('Greater Mumbai', 'India', datetime.date(2014, 8, 22), 'Platinum', 'Bills', 'F', 32555.0), ('Bengaluru', 'India', datetime.date(2014, 8, 27), 'Silver', 'Bills', 'F', 101738.0), ('Greater Mumbai', 'India', datetime.date(2014, 4, 12), 'Signature', 'Bills', 'F', 123424.0), ('Bengaluru', 'India', datetime.date(2015, 5, 5), 'Gold', 'Bills', 'F', 171574.0), ('Delhi', 'India', datetime.date(2014, 9, 8), 'Silver', 'Bills', 'F', 100036.0), ('Delhi', 'India', datetime.date(2015, 2, 24), 'Gold', 'Bills', 'F', 143250.0), ('Greater Mumbai', 'India', datetime.date(2014, 6, 26), 'Platinum', 'Bills', 'F', 150980.0), ('Delhi', 'India', datetime.date(2014, 3, 28), 'Silver', 'Bills', 'F', 192247.0), ('Delhi', 'India', datetime.date(2014, 9, 1), 'Platinum', 'Bills', 'F', 67932.0), ('Delhi', 'India', datetime.date(2014, 6, 22), 'Platinum', 'Bills', 'F', 280061.0), ('Greater Mumbai', 'India', datetime.date(2013, 12, 7), 'Signa

Turn the RDD into a DataFrame.

In [0]:
df = spark.createDataFrame(cc)
df.show()

+--------------+-----+----------+---------+-----+---+--------+
|            _1|   _2|        _3|       _4|   _5| _6|      _7|
+--------------+-----+----------+---------+-----+---+--------+
|         Delhi|India|2014-10-29|     Gold|Bills|  F| 82475.0|
|Greater Mumbai|India|2014-08-22| Platinum|Bills|  F| 32555.0|
|     Bengaluru|India|2014-08-27|   Silver|Bills|  F|101738.0|
|Greater Mumbai|India|2014-04-12|Signature|Bills|  F|123424.0|
|     Bengaluru|India|2015-05-05|     Gold|Bills|  F|171574.0|
|         Delhi|India|2014-09-08|   Silver|Bills|  F|100036.0|
|         Delhi|India|2015-02-24|     Gold|Bills|  F|143250.0|
|Greater Mumbai|India|2014-06-26| Platinum|Bills|  F|150980.0|
|         Delhi|India|2014-03-28|   Silver|Bills|  F|192247.0|
|         Delhi|India|2014-09-01| Platinum|Bills|  F| 67932.0|
|         Delhi|India|2014-06-22| Platinum|Bills|  F|280061.0|
|Greater Mumbai|India|2013-12-07|Signature|Bills|  F|278036.0|
|Greater Mumbai|India|2014-08-07|     Gold|Bills|  F| 1

The built in toDF method does the same thing.

In [0]:
df = cc.toDF()
df.show()
print(df)

+--------------+-----+----------+---------+-----+---+--------+
|            _1|   _2|        _3|       _4|   _5| _6|      _7|
+--------------+-----+----------+---------+-----+---+--------+
|         Delhi|India|2014-10-29|     Gold|Bills|  F| 82475.0|
|Greater Mumbai|India|2014-08-22| Platinum|Bills|  F| 32555.0|
|     Bengaluru|India|2014-08-27|   Silver|Bills|  F|101738.0|
|Greater Mumbai|India|2014-04-12|Signature|Bills|  F|123424.0|
|     Bengaluru|India|2015-05-05|     Gold|Bills|  F|171574.0|
|         Delhi|India|2014-09-08|   Silver|Bills|  F|100036.0|
|         Delhi|India|2015-02-24|     Gold|Bills|  F|143250.0|
|Greater Mumbai|India|2014-06-26| Platinum|Bills|  F|150980.0|
|         Delhi|India|2014-03-28|   Silver|Bills|  F|192247.0|
|         Delhi|India|2014-09-01| Platinum|Bills|  F| 67932.0|
|         Delhi|India|2014-06-22| Platinum|Bills|  F|280061.0|
|Greater Mumbai|India|2013-12-07|Signature|Bills|  F|278036.0|
|Greater Mumbai|India|2014-08-07|     Gold|Bills|  F| 1

In [0]:
df = cc.toDF(['City', 'Country', 'Date', 'CardType', 'TranType', 'Gender', 'Amount'])
df.show()

+--------------+-------+----------+---------+--------+------+--------+
|          City|Country|      Date| CardType|TranType|Gender|  Amount|
+--------------+-------+----------+---------+--------+------+--------+
|         Delhi|  India|2014-10-29|     Gold|   Bills|     F| 82475.0|
|Greater Mumbai|  India|2014-08-22| Platinum|   Bills|     F| 32555.0|
|     Bengaluru|  India|2014-08-27|   Silver|   Bills|     F|101738.0|
|Greater Mumbai|  India|2014-04-12|Signature|   Bills|     F|123424.0|
|     Bengaluru|  India|2015-05-05|     Gold|   Bills|     F|171574.0|
|         Delhi|  India|2014-09-08|   Silver|   Bills|     F|100036.0|
|         Delhi|  India|2015-02-24|     Gold|   Bills|     F|143250.0|
|Greater Mumbai|  India|2014-06-26| Platinum|   Bills|     F|150980.0|
|         Delhi|  India|2014-03-28|   Silver|   Bills|     F|192247.0|
|         Delhi|  India|2014-09-01| Platinum|   Bills|     F| 67932.0|
|         Delhi|  India|2014-06-22| Platinum|   Bills|     F|280061.0|
|Great

In [0]:
df = cc.toDF('City: string, Country: string, Date: date, CardType: string, TranType: string, Gender: string, Amount: double')
df.show()
print(df)


+--------------+-------+----------+---------+--------+------+--------+
|          City|Country|      Date| CardType|TranType|Gender|  Amount|
+--------------+-------+----------+---------+--------+------+--------+
|         Delhi|  India|2014-10-29|     Gold|   Bills|     F| 82475.0|
|Greater Mumbai|  India|2014-08-22| Platinum|   Bills|     F| 32555.0|
|     Bengaluru|  India|2014-08-27|   Silver|   Bills|     F|101738.0|
|Greater Mumbai|  India|2014-04-12|Signature|   Bills|     F|123424.0|
|     Bengaluru|  India|2015-05-05|     Gold|   Bills|     F|171574.0|
|         Delhi|  India|2014-09-08|   Silver|   Bills|     F|100036.0|
|         Delhi|  India|2015-02-24|     Gold|   Bills|     F|143250.0|
|Greater Mumbai|  India|2014-06-26| Platinum|   Bills|     F|150980.0|
|         Delhi|  India|2014-03-28|   Silver|   Bills|     F|192247.0|
|         Delhi|  India|2014-09-01| Platinum|   Bills|     F| 67932.0|
|         Delhi|  India|2014-06-22| Platinum|   Bills|     F|280061.0|
|Great

**LAB:** Use the regions and territories RDDs from the previous lab and convert them into DataFrames with meaningful schemas.


Convert a DataFrame into a JSON string.

In [0]:
print (df.toJSON().take(10))

['{"City":"Delhi","Country":"India","Date":"2014-10-29","CardType":"Gold","TranType":"Bills","Gender":"F","Amount":82475.0}', '{"City":"Greater Mumbai","Country":"India","Date":"2014-08-22","CardType":"Platinum","TranType":"Bills","Gender":"F","Amount":32555.0}', '{"City":"Bengaluru","Country":"India","Date":"2014-08-27","CardType":"Silver","TranType":"Bills","Gender":"F","Amount":101738.0}', '{"City":"Greater Mumbai","Country":"India","Date":"2014-04-12","CardType":"Signature","TranType":"Bills","Gender":"F","Amount":123424.0}', '{"City":"Bengaluru","Country":"India","Date":"2015-05-05","CardType":"Gold","TranType":"Bills","Gender":"F","Amount":171574.0}', '{"City":"Delhi","Country":"India","Date":"2014-09-08","CardType":"Silver","TranType":"Bills","Gender":"F","Amount":100036.0}', '{"City":"Delhi","Country":"India","Date":"2015-02-24","CardType":"Gold","TranType":"Bills","Gender":"F","Amount":143250.0}', '{"City":"Greater Mumbai","Country":"India","Date":"2014-06-26","CardType":"Plat

In [0]:
df.printSchema()
print (df.columns, df.count())

root
 |-- City: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- Date: date (nullable = true)
 |-- CardType: string (nullable = true)
 |-- TranType: string (nullable = true)
 |-- Gender: string (nullable = true)
 |-- Amount: double (nullable = true)

['City', 'Country', 'Date', 'CardType', 'TranType', 'Gender', 'Amount'] 26052


Choose particular columns from a DataFrame.

In [0]:
df.select('City', 'Country', 'Amount').show(10)

+--------------+-------+--------+
|          City|Country|  Amount|
+--------------+-------+--------+
|         Delhi|  India| 82475.0|
|Greater Mumbai|  India| 32555.0|
|     Bengaluru|  India|101738.0|
|Greater Mumbai|  India|123424.0|
|     Bengaluru|  India|171574.0|
|         Delhi|  India|100036.0|
|         Delhi|  India|143250.0|
|Greater Mumbai|  India|150980.0|
|         Delhi|  India|192247.0|
|         Delhi|  India| 67932.0|
+--------------+-------+--------+
only showing top 10 rows



In [0]:
df.select('City', 'Country').distinct().show()

+------------+-------+
|        City|Country|
+------------+-------+
|      Bhabua|  India|
|      Rajgir|  India|
|    Mahidpur|  India|
|   Brahmapur|  India|
|     Udaipur|  India|
|    Sunabeda|  India|
|     Kurnool|  India|
| Kodungallur|  India|
|    Surapura|  India|
|    Kashipur|  India|
|       Mansa|  India|
|Wadgaon Road|  India|
|   Lingsugur|  India|
|  Sultanganj|  India|
| Udaipurwati|  India|
| Nanjikottai|  India|
|       Buxar|  India|
|       Raver|  India|
|  Nedumangad|  India|
| Tilda Newra|  India|
+------------+-------+
only showing top 20 rows



Sort a DataFrame. The sort and orderBy methods are different aliases for the exact same method.

In [0]:
df.sort(df.Amount).show()
df.sort(df.Amount, ascending = False).show()
df.select('City', 'Amount').orderBy(df.City).show()

+--------------+-------+----------+---------+-------------+------+------+
|          City|Country|      Date| CardType|     TranType|Gender|Amount|
+--------------+-------+----------+---------+-------------+------+------+
|         Delhi|  India|2014-05-02| Platinum|      Grocery|     F|1005.0|
|   Murshidabad|  India|2014-06-12|   Silver|         Food|     M|1018.0|
|     Ahmedabad|  India|2015-01-19|Signature|      Grocery|     F|1024.0|
|       Lucknow|  India|2014-03-16|Signature|        Bills|     F|1026.0|
|     Ahmedabad|  India|2015-02-23|   Silver|         Food|     F|1028.0|
|     Ahmedabad|  India|2015-02-12|     Gold|         Fuel|     F|1038.0|
|Greater Mumbai|  India|2014-04-30|     Gold|        Bills|     F|1056.0|
|         Delhi|  India|2014-12-19|     Gold|Entertainment|     F|1061.0|
|       Chennai|  India|2014-07-23|     Gold|         Food|     F|1066.0|
|     Hyderabad|  India|2013-11-13|     Gold|       Travel|     F|1070.0|
|     Bengaluru|  India|2013-12-02|   

Create a new DataFrame with a new calculated column added.

In [0]:
df2 = df.withColumn('Discount', df.Amount * .03)
df2.show()

+--------------+-------+----------+---------+--------+------+--------+------------------+
|          City|Country|      Date| CardType|TranType|Gender|  Amount|          Discount|
+--------------+-------+----------+---------+--------+------+--------+------------------+
|         Delhi|  India|2014-10-29|     Gold|   Bills|     F| 82475.0|           2474.25|
|Greater Mumbai|  India|2014-08-22| Platinum|   Bills|     F| 32555.0|            976.65|
|     Bengaluru|  India|2014-08-27|   Silver|   Bills|     F|101738.0|           3052.14|
|Greater Mumbai|  India|2014-04-12|Signature|   Bills|     F|123424.0|           3702.72|
|     Bengaluru|  India|2015-05-05|     Gold|   Bills|     F|171574.0|           5147.22|
|         Delhi|  India|2014-09-08|   Silver|   Bills|     F|100036.0|           3001.08|
|         Delhi|  India|2015-02-24|     Gold|   Bills|     F|143250.0|            4297.5|
|Greater Mumbai|  India|2014-06-26| Platinum|   Bills|     F|150980.0|            4529.4|
|         

Remove an unwanted column from a DataFrame.

In [0]:
df3 = df2.drop(df2.Country)
df3.show()

+--------------+----------+---------+--------+------+--------+------------------+
|          City|      Date| CardType|TranType|Gender|  Amount|          Discount|
+--------------+----------+---------+--------+------+--------+------------------+
|         Delhi|2014-10-29|     Gold|   Bills|     F| 82475.0|           2474.25|
|Greater Mumbai|2014-08-22| Platinum|   Bills|     F| 32555.0|            976.65|
|     Bengaluru|2014-08-27|   Silver|   Bills|     F|101738.0|           3052.14|
|Greater Mumbai|2014-04-12|Signature|   Bills|     F|123424.0|           3702.72|
|     Bengaluru|2015-05-05|     Gold|   Bills|     F|171574.0|           5147.22|
|         Delhi|2014-09-08|   Silver|   Bills|     F|100036.0|           3001.08|
|         Delhi|2015-02-24|     Gold|   Bills|     F|143250.0|            4297.5|
|Greater Mumbai|2014-06-26| Platinum|   Bills|     F|150980.0|            4529.4|
|         Delhi|2014-03-28|   Silver|   Bills|     F|192247.0|           5767.41|
|         Delhi|

The filter and where methods can both be used and have alternative ways to represent the condition.

In [0]:
df3.filter(df3.Amount < 4000).show()
print(df3.filter('Amount < 4000').count())
print(df3.where('Amount < 4000').count())
print(df3.where(df3.Amount < 4000).count())

print (df3.where((df3.Amount > 3000) & (df3.Amount < 4000)).count())
print (df3.where('Amount > 3000 and Amount < 4000').count())

+--------------+----------+---------+-------------+------+------+------------------+
|          City|      Date| CardType|     TranType|Gender|Amount|          Discount|
+--------------+----------+---------+-------------+------+------+------------------+
|     Bengaluru|2015-02-05| Platinum|        Bills|     F|3427.0|            102.81|
|Greater Mumbai|2015-04-27|Signature|        Bills|     F|2138.0|             64.14|
|     Bengaluru|2013-10-18|Signature|         Food|     F|2397.0|             71.91|
|     Bengaluru|2015-02-07|   Silver|         Fuel|     F|2686.0|             80.58|
|         Delhi|2014-01-23|   Silver|         Food|     F|1400.0|              42.0|
|     Ahmedabad|2013-10-31|   Silver|         Fuel|     F|2586.0|             77.58|
|     Bengaluru|2014-05-08|   Silver|        Bills|     F|2741.0|             82.23|
|     Bengaluru|2014-01-10|   Silver|         Food|     F|3421.0|            102.63|
|     Bengaluru|2015-04-29| Platinum|         Food|     F|3621.0|

Load a CSV file directly into a DataFrame using alternate syntaxes.

**LAB:** Using the df3 DataFrame, answer the following questions:

How many Platinum card purchases were there with a discount above $100?

Find the ten biggest discount amounts earned by women and show just the purchase amount, discount, and date.

JOINs work as expected.

In [0]:
tab1 = sc.parallelize([(1, 'Alpha'), (2, 'Beta'), (3, 'Delta')]).toDF('ID:int, code:string')
tab2 = sc.parallelize([(100, 'One', 1), (101, 'Two', 2), (102, 'Three', 1), (103, 'Four', 4)]).toDF('ID:int, name:string, parentID:int')
tab1.join(tab2, tab1.ID == tab2.parentID).show()
tab1.join(tab2, tab1.ID == tab2.parentID, 'left').show()
tab1.join(tab2, tab1.ID == tab2.parentID, 'right').show()
tab1.join(tab2, tab1.ID == tab2.parentID, 'full').show()


+---+-----+---+-----+--------+
| ID| code| ID| name|parentID|
+---+-----+---+-----+--------+
|  1|Alpha|100|  One|       1|
|  1|Alpha|102|Three|       1|
|  2| Beta|101|  Two|       2|
+---+-----+---+-----+--------+

+---+-----+----+-----+--------+
| ID| code|  ID| name|parentID|
+---+-----+----+-----+--------+
|  1|Alpha| 100|  One|       1|
|  1|Alpha| 102|Three|       1|
|  3|Delta|null| null|    null|
|  2| Beta| 101|  Two|       2|
+---+-----+----+-----+--------+

+----+-----+---+-----+--------+
|  ID| code| ID| name|parentID|
+----+-----+---+-----+--------+
|   1|Alpha|100|  One|       1|
|   1|Alpha|102|Three|       1|
|null| null|103| Four|       4|
|   2| Beta|101|  Two|       2|
+----+-----+---+-----+--------+

+----+-----+----+-----+--------+
|  ID| code|  ID| name|parentID|
+----+-----+----+-----+--------+
|   1|Alpha| 100|  One|       1|
|   1|Alpha| 102|Three|       1|
|   3|Delta|null| null|    null|
|null| null| 103| Four|       4|
|   2| Beta| 101|  Two|       2|
+---

Examples of aggregate functions.

In [0]:
tab3 = sc.parallelize([(1, 10), (1, 20), (1, 30), (2, 40), (2,50)]).toDF('groupID:int, amount:int')
tab3.groupby('groupID').max().show()
tab3.groupby('groupID').sum().show()
x = tab3.groupby('groupID')
x.agg({'amount':'sum', 'amount':'max'}).show()
from pyspark.sql import functions as F
x.agg(F.sum('amount'), F.max('amount')).show()

+-------+-----------+
|groupID|max(amount)|
+-------+-----------+
|      1|         30|
|      2|         50|
+-------+-----------+

+-------+-----------+-----------+
|groupID|sum(amount)|max(amount)|
+-------+-----------+-----------+
|      1|         60|         30|
|      2|         90|         50|
+-------+-----------+-----------+



Examples of reading a CSV directly into a DataFrame.

In [0]:
filename = '/home/student/ROI/SparkProgram/datasets/finance/CreditCard.csv'
df4 = spark.read.load(filename, format = 'csv', sep = ',', inferSchema = True, header = True)
df4.printSchema()

In [0]:
df4 = spark.read.format('csv').option('header','true').option('inferSchema','true').load(filename)
df4.printSchema()

In [0]:
df4 = spark.read.csv(filename, header = True, inferSchema = True)
df4.printSchema()

In [0]:
df4.show()

**LAB:** Read the Products file from the JSON folder and categories from ths CSVHeaders folder, then join them displaying just the product and category IDs and names, and sort by categoryID then productID. 

Hint: Drop the ambiguous column after the join.

Change the name of the City column to CityCountry.

In [0]:
cols = df4.columns
cols[0] = 'CityCountry'
df4 = df4.toDF(*cols)
df4.printSchema()

Apply a custom UDF to columns to separate the City and Country and convert the Date into a date datatype.

In [0]:
from pyspark.sql.functions import udf
from pyspark.sql.types import *
from pyspark.sql.functions import to_date
import datetime

def city(x):
    return x[:x.find(',')]
def country(x):
    return x[x.find(',') + 1 :]

df5 = df4.withColumn('City', udf(city, StringType())(df4.CityCountry)) \
      .withColumn('Country', udf(country, StringType())(df4.CityCountry)) \
      .withColumn('Date', to_date(df4.Date, 'dd-MMM-yy')) \
      .drop(df4.CityCountry)
df5.show()

DataFrames can be written to a variety of file formats. Here we are writing it to JSON.

In [0]:
df5.write.json('/home/student/Desktop/CreditCard.json')

Read a JSON file into a DataFrame, but note that we lose the datatypes.

In [0]:
df6 = spark.read.json('/home/student/Desktop/CreditCard.json')
df6.printSchema()

Create a schema that can be used to import a file and directly name the columns and convert them to the desired data type.

In [0]:
schema = StructType([
    StructField('Date', DateType()), 
    StructField('Card Type', StringType()),
    StructField('Exp Type', StringType()),
    StructField('Gender', StringType()),
    StructField('Amount', FloatType()),
    StructField('City', StringType()),
    StructField('Country', StringType())
])
df6 = spark.read.json('/home/student/Desktop/CreditCard.json', schema = schema)
df6.printSchema()

**HOMEWORK:**

In the /home/student/ROI/SparkProgram/datasets/finance folder are two files:

top50banks.tsv is tab separated

countrycodes.json is a json file

Using these two files, create an alphabetical list of countries and show the sum, min, and max of their total assets and find the country with the most assets and what that amount is.
