# Downloading historical forex data from forexite
The url I got from [here](https://www.quantshare.com/sa-421-6-places-to-download-historical-intraday-forex-quotes-data-for-free). It has 1m resolution data for [lots](./forexite_pairs.txt) of symbols. Eah file contains one days data for all symbols. I need to rework the data, one file per symbol. Store in pandas and serialize. Need to handle time zones somehow...

In [1]:
import datetime
import zipfile
import urllib.request
from pathlib import Path

In [2]:
date = datetime.datetime.now()
# e.g. url = "https://www.forexite.com/free_forex_quotes/2011/11/011111.zip"
url = "https://www.forexite.com/free_forex_quotes/{}/{:02d}/{:02d}{:02d}{:02d}.zip"
filename = "./forexite/{:04d}-{:02d}-{:02d}.zip"
new_files = []

for x in range(1, 365):
    data_date = date - datetime.timedelta(days=x)
    year = data_date.year
    year_2 = int(str(year)[-2:])
    month = data_date.month
    day = data_date.day
    download = url.format(year, month, day, month, year_2)
    file = filename.format(year, month, day)
    file_obj = Path(file)
    if not file_obj.exists():
        print(download)
        if not Path('./forexite').is_dir():
            Path('./forexite').mkdir()
        urllib.request.urlretrieve(download, file)
        new_files.append(file_obj)


https://www.forexite.com/free_forex_quotes/2020/05/100520.zip
https://www.forexite.com/free_forex_quotes/2020/05/090520.zip
https://www.forexite.com/free_forex_quotes/2020/05/080520.zip
https://www.forexite.com/free_forex_quotes/2020/05/070520.zip
https://www.forexite.com/free_forex_quotes/2020/05/060520.zip
https://www.forexite.com/free_forex_quotes/2020/05/050520.zip
https://www.forexite.com/free_forex_quotes/2020/05/040520.zip
https://www.forexite.com/free_forex_quotes/2020/05/030520.zip
https://www.forexite.com/free_forex_quotes/2020/05/020520.zip
https://www.forexite.com/free_forex_quotes/2020/05/010520.zip
https://www.forexite.com/free_forex_quotes/2020/04/300420.zip
https://www.forexite.com/free_forex_quotes/2020/04/290420.zip
https://www.forexite.com/free_forex_quotes/2020/04/280420.zip
https://www.forexite.com/free_forex_quotes/2020/04/270420.zip
https://www.forexite.com/free_forex_quotes/2020/04/260420.zip
https://www.forexite.com/free_forex_quotes/2020/04/250420.zip
https://

### Converting to csv files, one ticker per file
I want one file for each currency pair. I would like it to read the current file, figure out if there is any new data and add it, and then save the file again. I will read and analyse the files in a different codepath.

In fact, lets convert the forexite data into csv files first, these are human readable and pandas readable.
Thoughts:
- Do I dump everything into the csv, then sort and dedupe
- Should I use a list of tuples, (datetime, o, h, l, c)
- I could load the existing file (if it exists) into a set, each new file can check to see if it is duplicated
- I only need to use the date to dedupe, so, from each file, get the date, check the set. If not there, we add all the rows to the list of tuples
- Then we sort them and write them out
- This might have quite high memory requirements, can we make some assumptions about the order the files are presented to us? Would be challenging if there were gaps in the data, is this an edge case?
- Lets start with the first run case, we can optimise later if need be.

In [3]:
# function that takes a filename
# function that takes a row and a list
def parse_row(row_from_file):
    # Row looks like:
    # EURUSD,20191101,000900,1.1150,1.1150,1.1150,1.1150
    tokens = row_from_file.split(',')
    assert len(tokens) == 7, f"Not enough tokens in: {row_from_file}"
    
    date_time_str = str(tokens[1]) + str(tokens[2])
    dt = datetime.datetime.strptime(date_time_str, '%Y%m%d%H%M%S')
    
    ticker = tokens[0].strip()
    opener = tokens[3].strip()
    high = tokens[4].strip()
    low = tokens[5].strip()
    close = tokens[6].strip()
    
    return ticker, (dt, opener, high, low, close)



In [4]:
file_handles = {}

print("Generating file list")
#new_files = None # HACK!
files = new_files
if not new_files:
    files = Path('./forexite/').glob('*.zip')
files = sorted(list(files))


if not Path('./csv').is_dir():
    print("Creating output directory")
    Path('./csv').mkdir()
    
print("Parsing available tickers and Opening output files")
with zipfile.ZipFile(files[-1]) as z:
    fn = z.namelist()[0]
    print(f"Processing {fn}")
    with z.open(fn) as f:
        next(f)
        for row in f:
            clean_row = row.decode("utf-8").strip()
            ticker, ohlc = parse_row(clean_row)
            if ticker not in file_handles:
                file_handles[ticker] = open(f"./csv/{ticker}.csv", "a")

print("Parsing files")
for file in files:
    with zipfile.ZipFile(file) as z:
        fn = z.namelist()[0]
        print(f"Processing {fn}")
        with z.open(fn) as f:
            next(f)
            for row in f:
                clean_row = row.decode("utf-8").strip()
                ticker, ohlc = parse_row(clean_row)
                if ticker not in file_handles:
                    continue
                file_handles[ticker].write("{}, {}, {}, {}, {}\n".format(*ohlc))
                
print("Closing output files")
for _, handle in file_handles.items():
    handle.close()
                
  

Generating file list
Parsing available tickers and Opening output files
Processing 100520.txt
Parsing files
Processing 210320.txt
Processing 220320.txt
Processing 230320.txt
Processing 240320.txt
Processing 250320.txt
Processing 260320.txt
Processing 270320.txt
Processing 280320.txt
Processing 290320.txt
Processing 300320.txt
Processing 310320.txt
Processing 010420.txt
Processing 020420.txt
Processing 030420.txt
Processing 040420.txt
Processing 050420.txt
Processing 060420.txt
Processing 070420.txt
Processing 080420.txt
Processing 090420.txt
Processing 100420.txt
Processing 110420.txt
Processing 120420.txt
Processing 130420.txt
Processing 140420.txt
Processing 150420.txt
Processing 160420.txt
Processing 170420.txt
Processing 180420.txt
Processing 190420.txt
Processing 200420.txt
Processing 210420.txt
Processing 220420.txt
Processing 230420.txt
Processing 240420.txt
Processing 250420.txt
Processing 260420.txt
Processing 270420.txt
Processing 280420.txt
Processing 290420.txt
Processing 3