## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [2]:
# File location and type
file_location = "/FileStore/tables/Lockdown_Join_with_normaliztion_factor-ed399.csv"
file_type = "csv"

# CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

##display(df)

In [3]:
from pyspark import SparkContext

#sc = SparkContext()

In [4]:
df.printSchema()


In [5]:
df.head(5)

In [6]:
df.show(2,truncate= True)

In [7]:
df.count()

In [8]:
len(df.columns), df.columns

In [9]:
df.describe().show()

In [10]:
df.describe('cases').show()

In [11]:
df.select('fips','cases').show(5)

In [12]:
df.select('fips','cases').distinct().count()

In [13]:
#df.crosstab('state', 'cases').show()

In [14]:
#df.crosstab('state', 'cases').dropDuplicates().show()

In [15]:
df.dropna().count()

In [16]:
df.groupby('state').agg({'cases': 'mean'}).show()

In [17]:
df.groupby('cases').count().show()

In [18]:
#df.select('date').map(lambda x:(x,1)).take(5)

In [19]:
df.orderBy(df.cases.desc()).show(5)

In [20]:
%fs ls

path,name,size
dbfs:/FileStore/,FileStore/,0
dbfs:/databricks-datasets/,databricks-datasets/,0
dbfs:/databricks-results/,databricks-results/,0
dbfs:/tmp/,tmp/,0


In [21]:
%fs ls dbfs:/databricks-datasets

path,name,size
dbfs:/databricks-datasets/,databricks-datasets/,0
dbfs:/databricks-datasets/COVID/,COVID/,0
dbfs:/databricks-datasets/README.md,README.md,976
dbfs:/databricks-datasets/Rdatasets/,Rdatasets/,0
dbfs:/databricks-datasets/SPARK_README.md,SPARK_README.md,3359
dbfs:/databricks-datasets/adult/,adult/,0
dbfs:/databricks-datasets/airlines/,airlines/,0
dbfs:/databricks-datasets/amazon/,amazon/,0
dbfs:/databricks-datasets/asa/,asa/,0
dbfs:/databricks-datasets/atlas_higgs/,atlas_higgs/,0
