In [94]:
# When I tried to run this process through pure Julia code, I would have to create a DataFrame of the data
# by using HTTP, and then use CSV to read the data into a Julia dataframe.
# But I found that CSV.jl refused to recognize the true structure of the csv file, which doesn't have the same
# columns as previous versions of NSRDB had. They use the old structure but put new values in, which Julia can't
# recognize but Pandas's read_csv can. So I gave in and am using Pandas via Pandas.jl

# Ideally I find a way to do this in Julia, but I'm alright with the overhead of calling to Python,
# because once we're there, Pandas itself is quite fast. So it takes the same amount of time to read 1 row vs all
# And it returns correct values, which I couldn't figure out how to do simply with HTTP.jl + CSV.jl

# using CSV
# using HTTP
using Pandas

In [95]:
# Declare all variables as strings. Spaces must be replaced with '+', i.e., change 'John Smith' to 'John+Smith'.

# Define the lat, long of the location
lat, lon = 9.931592, -84.107174

# Set the attributes to extract (e.g., dhi, ghi, etc.), separated by commas.
attributes = "ghi,dhi,dni,wind_speed,air_temperature,solar_zenith_angle"

# Choose year of data
year = "2017"

# Set leap year to true or false. True will return leap day data if present, false will not.
leap_year = "false"

# Set time interval in minutes, i.e., '30' is half hour intervals. Valid intervals are 30 & 60.
interval = "30"

# Specify Coordinated Universal Time (UTC), 'true' will use UTC, 'false' will use the local time zone of the data.
# NOTE: In order to use the NSRDB data in SAM, you must specify UTC as 'false'. SAM requires the data to be in the
# local time zone.
utc = "false"

# Your full name, use '+' instead of spaces.
your_name = "Arnav+Gautam"

# Your reason for using the NSRDB.
reason_for_use = "data+analysis"

# Your affiliation
your_affiliation = "no+affiliation"

# Your email address
your_email = "arnavgautam@berkeley.edu"

# Please join our mailing list so we can keep you up-to-date on new developments.
mailing_list = "false";

In [96]:
# You must have an NSRDB api key
file = open("arnav-nrel-api-key.txt")
api_key = read(file, String)
close(file)

In [97]:
# Declare url string
url = "http://developer.nrel.gov/api/solar/nsrdb_psm3_download.csv?wkt=POINT($(lon)%20$(lat))&names=$(year)&leap_day=$(leap_year)&interval=$(interval)&utc=$(utc)&full_name=$(your_name)&email=$(your_email)&affiliation=$(your_affiliation)&mailing_list=$(mailing_list)&reason=$(reason_for_use)&api_key=$(api_key)&attributes=$(attributes)";

In [99]:
# The first two rows are not data values
# The column names and the "first row" are metadata identifiers and values, respectively
# The "second row" is what we'd consider the "column names" for the actual data, which is row 3 and beyond
# Note that there are 46 metadata "columns", but only 11 data "columns". This trips up CSV.jl when it reads

# Pure Julia implemetation (output displayed before)
# res = HTTP.get(url);
# info = CSV.read(res.body; limit=2)
# show(info, allcols=true)

# Alternate Implementation with Pandas
# info = read_csv(url, nrows=2)
# show(info)

2×46 DataFrames.DataFrame
│ Row │ Source │ Location ID │ City   │ State  │ Country │ Latitude │
│     │ [90mString[39m │ [90mString[39m      │ [90mString[39m │ [90mString[39m │ [90mString[39m  │ [90mString[39m   │
├─────┼────────┼─────────────┼────────┼────────┼─────────┼──────────┤
│ 1   │ NSRDB  │ 693763      │ -      │ -      │ -       │ 33.21    │
│ 2   │ Year   │ Month       │ Day    │ Hour   │ Minute  │ GHI      │

│ Row │ Longitude │ Time Zone │ Elevation  │ Local Time Zone │
│     │ [90mString[39m    │ [90mString[39m    │ [90mString[39m     │ [90mString[39m          │
├─────┼───────────┼───────────┼────────────┼─────────────────┤
│ 1   │ -97.14    │ -6        │ 203        │ -6              │
│ 2   │ DHI       │ DNI       │ Wind Speed │ Temperature     │

│ Row │ Clearsky DHI Units │ Clearsky DNI Units │ Clearsky GHI Units │
│     │ [90mString[39m             │ [90mString⍰[39m            │ [90mString⍰[39m            │
├─────┼────────────────────┼───────

In [93]:
# Read the actual data, "row" 3 and below

# Incorrect Julia implementation (tries to read all 46 columns)
# nsrdb_data_frame = CSV.read(res.body; skipto=4, limit=2);
# show(nsrdb_data_frame, allcols=true)

# Correct Python implementation (correctly only includes 11 columns)
nsrdb_data_frame = read_csv(url, skiprows=2);
show(nsrdb_data_frame)

       Year  Month  Day  Hour  ...  DNI  Wind Speed  Temperature  Solar Zenith Angle
0      2010      1    1     0  ...    0         2.9           -2              167.60
1      2010      1    1     0  ...    0         2.7           -2              169.78
2      2010      1    1     1  ...    0         2.5           -3              168.08
3      2010      1    1     1  ...    0         2.4           -3              163.67
4      2010      1    1     2  ...    0         2.4           -4              158.14
...     ...    ...  ...   ...  ...  ...         ...          ...                 ...
17515  2010     12   31    21  ...    0         3.9            0              138.93
17516  2010     12   31    22  ...    0         3.8           -1              145.18
17517  2010     12   31    22  ...    0         3.6           -1              151.36
17518  2010     12   31    23  ...    0         3.4           -2              157.36
17519  2010     12   31    23  ...    0         3.1           -2 

In [91]:
# Print out the column names that the data actually follows:
columns(nsrdb_data_frame)

Index(['Year', 'Month', 'Day', 'Hour', 'Minute', 'GHI', 'DHI', 'DNI',
       'Wind Speed', 'Temperature', 'Solar Zenith Angle'],
      dtype='object')
