# Downloading historical forex data from forexite
The url I got from [here](https://www.quantshare.com/sa-421-6-places-to-download-historical-intraday-forex-quotes-data-for-free). It has 1m resolution data for [lots](./forexite_pairs.txt) of symbols. Eah file contains one days data for all symbols. I need to rework the data, one file per symbol. Store in pandas and serialize. Need to handle time zones somehow...

In [1]:
import datetime
import zipfile
import urllib.request
from pathlib import Path

In [2]:
date = datetime.datetime.now()
# e.g. url = "https://www.forexite.com/free_forex_quotes/2011/11/011111.zip"
url = "https://www.forexite.com/free_forex_quotes/{}/{:02d}/{:02d}{:02d}{:02d}.zip"
filename = "./forexite/{:04d}-{:02d}-{:02d}.zip"
new_files = []

for x in range(1, 365):
    data_date = date - datetime.timedelta(days=x)
    year = data_date.year
    year_2 = int(str(year)[-2:])
    month = data_date.month
    day = data_date.day
    download = url.format(year, month, day, month, year_2)
    file = filename.format(year, month, day)
    file_obj = Path(file)
    if not file_obj.exists():
        print(download)
        urllib.request.urlretrieve(download, file)
        new_files.append(file_obj)


https://www.forexite.com/free_forex_quotes/2020/01/120120.zip
https://www.forexite.com/free_forex_quotes/2020/01/110120.zip
https://www.forexite.com/free_forex_quotes/2020/01/100120.zip
https://www.forexite.com/free_forex_quotes/2020/01/090120.zip
https://www.forexite.com/free_forex_quotes/2020/01/080120.zip
https://www.forexite.com/free_forex_quotes/2020/01/070120.zip
https://www.forexite.com/free_forex_quotes/2020/01/060120.zip
https://www.forexite.com/free_forex_quotes/2020/01/050120.zip
https://www.forexite.com/free_forex_quotes/2020/01/040120.zip
https://www.forexite.com/free_forex_quotes/2020/01/030120.zip


### Converting to pandas dataframes
I want one data frame for each currency pair. I would like it to read the current saved dataframe, figure out if there is any new data and add it, and then save the dataframe again. I will read and analyse the dataframes in a different codepath.

In fact, lets convert the forexite data into csv files first, these are human readable and pandas readable.
Thoughts:
- Do I dump everything into the csv, then sort and dedupe
- Should I use a list of tuples, (datetime, o, h, l, c)
- I could load the existing file (if it exists) into a set, each new file can check to see if it is duplicated
- I only need to use the date to dedupe, so, from each file, get the date, check the set. If not there, we add all the rows to the list of tuples
- Then we sort them and write them out
- This might have quite high memory requirements, can we make some assumptions about the order the files are presented to us? Would be challenging if there were gaps in the data, is this an edge case?
- Lets start with the first run case, we can optimise later if need be.

In [3]:
# function that takes a filename
# function that takes a row and a list
def parse_row(row_from_file):
    # Row looks like:
    # EURUSD,20191101,000900,1.1150,1.1150,1.1150,1.1150
    tokens = row_from_file.split(',')
    assert len(tokens) == 7, f"Not enough tokens in: {row_from_file}"
    
    date_time_str = str(tokens[1]) + str(tokens[2])
    dt = datetime.datetime.strptime(date_time_str, '%Y%m%d%H%M%S')
    
    ticker = tokens[0]
    opener = tokens[3]
    high = tokens[4]
    low = tokens[5]
    close = tokens[6]
    
    return ticker, (dt, opener, high, low, close)



In [4]:
file_handles = {}

# Open one file per ticker, write
print("Opening output files")
with open("./tickers_forexite.txt") as tickers:
    for ticker in tickers:
        clean_ticker = ticker.strip()
        file_handles[clean_ticker] = open(f"./csv/{clean_ticker}.csv", "a")

print("Loading file list")
files = new_files
if not files:
    files = Path('./forexite/').glob('*.zip')
files = sorted(list(files))

print("Parsing files")
for file in files:
    with zipfile.ZipFile(file) as z:
        fn = z.namelist()[0]
        print(f"Processing {fn}")
        with z.open(fn) as f:
            next(f)
            for row in f:
                clean_row = row.decode("utf-8").strip()
                ticker, ohlc = parse_row(clean_row)
                if ticker not in file_handles:
                    continue
                file_handles[ticker].write("{}, {}, {}, {}, {}\n".format(*ohlc))
                
print("Closing output files")
for _, handle in file_handles.items():
    handle.close()
                
  

Opening output files
Loading file list
Parsing files
Processing 030120.txt
Processing 040120.txt
Processing 050120.txt
Processing 060120.txt
Processing 070120.txt
Processing 080120.txt
Processing 090120.txt
Processing 100120.txt
Processing 110120.txt
Processing 120120.txt
Closing output files
