# Downloading historical forex data from forexite
The url I got from [here](https://www.quantshare.com/sa-421-6-places-to-download-historical-intraday-forex-quotes-data-for-free). It has 1m resolution data for [lots](./forexite_pairs.txt) of symbols. Eah file contains one days data for all symbols. I need to rework the data, one file per symbol. Store in pandas and serialize. Need to handle time zones somehow...

In [1]:
import datetime
import zipfile
import urllib.request
from pathlib import Path

In [3]:
date = datetime.datetime.now()
# e.g. url = "https://www.forexite.com/free_forex_quotes/2011/11/011111.zip"
url = "https://www.forexite.com/free_forex_quotes/{}/{:02d}/{:02d}{:02d}{:02d}.zip"
filename = "./forexite/{:04d}-{:02d}-{:02d}.zip"
new_files = []

for x in range(1, 80):
    data_date = date - datetime.timedelta(days=x)
    year = data_date.year
    year_2 = int(str(year)[-2:])
    month = data_date.month
    day = data_date.day
    download = url.format(year, month, day, month, year_2)
    file = filename.format(year, month, day)
    file_obj = Path(file)
    if not file_obj.exists():
        print(download)
        if not Path('./forexite').is_dir():
            Path('./forexite').mkdir()
        urllib.request.urlretrieve(download, file)
        new_files.append(file_obj)


https://www.forexite.com/free_forex_quotes/2020/01/170120.zip
https://www.forexite.com/free_forex_quotes/2020/01/160120.zip
https://www.forexite.com/free_forex_quotes/2019/10/281019.zip
https://www.forexite.com/free_forex_quotes/2019/10/271019.zip
https://www.forexite.com/free_forex_quotes/2019/10/261019.zip
https://www.forexite.com/free_forex_quotes/2019/10/251019.zip
https://www.forexite.com/free_forex_quotes/2019/10/241019.zip
https://www.forexite.com/free_forex_quotes/2019/10/231019.zip
https://www.forexite.com/free_forex_quotes/2019/10/221019.zip
https://www.forexite.com/free_forex_quotes/2019/10/211019.zip
https://www.forexite.com/free_forex_quotes/2019/10/201019.zip
https://www.forexite.com/free_forex_quotes/2019/10/191019.zip
https://www.forexite.com/free_forex_quotes/2019/10/181019.zip
https://www.forexite.com/free_forex_quotes/2019/10/171019.zip
https://www.forexite.com/free_forex_quotes/2019/10/161019.zip
https://www.forexite.com/free_forex_quotes/2019/10/151019.zip
https://

### Converting to csv files, one ticker per file
I want one file for each currency pair. I would like it to read the current file, figure out if there is any new data and add it, and then save the file again. I will read and analyse the files in a different codepath.

In fact, lets convert the forexite data into csv files first, these are human readable and pandas readable.
Thoughts:
- Do I dump everything into the csv, then sort and dedupe
- Should I use a list of tuples, (datetime, o, h, l, c)
- I could load the existing file (if it exists) into a set, each new file can check to see if it is duplicated
- I only need to use the date to dedupe, so, from each file, get the date, check the set. If not there, we add all the rows to the list of tuples
- Then we sort them and write them out
- This might have quite high memory requirements, can we make some assumptions about the order the files are presented to us? Would be challenging if there were gaps in the data, is this an edge case?
- Lets start with the first run case, we can optimise later if need be.

In [2]:
# function that takes a filename
# function that takes a row and a list
def parse_row(row_from_file):
    # Row looks like:
    # EURUSD,20191101,000900,1.1150,1.1150,1.1150,1.1150
    tokens = row_from_file.split(',')
    assert len(tokens) == 7, f"Not enough tokens in: {row_from_file}"
    
    date_time_str = str(tokens[1]) + str(tokens[2])
    dt = datetime.datetime.strptime(date_time_str, '%Y%m%d%H%M%S')
    
    ticker = tokens[0].strip()
    opener = tokens[3].strip()
    high = tokens[4].strip()
    low = tokens[5].strip()
    close = tokens[6].strip()
    
    return ticker, (dt, opener, high, low, close)



In [6]:
file_handles = {}

print("Generating file list")
new_files = None # HACK!
files = new_files
if not new_files:
    files = Path('./forexite/').glob('*.zip')
files = sorted(list(files))


if not Path('./csv').is_dir():
    print("Creating output directory")
    Path('./csv').mkdir()
    
print("Parsing available tickers and Opening output files")
with zipfile.ZipFile(files[-1]) as z:
    fn = z.namelist()[0]
    print(f"Processing {fn}")
    with z.open(fn) as f:
        next(f)
        for row in f:
            clean_row = row.decode("utf-8").strip()
            ticker, ohlc = parse_row(clean_row)
            if ticker not in file_handles:
                file_handles[ticker] = open(f"./csv/{ticker}.csv", "a")

print("Parsing files")
for file in files:
    with zipfile.ZipFile(file) as z:
        fn = z.namelist()[0]
        print(f"Processing {fn}")
        with z.open(fn) as f:
            next(f)
            for row in f:
                clean_row = row.decode("utf-8").strip()
                ticker, ohlc = parse_row(clean_row)
                if ticker not in file_handles:
                    continue
                file_handles[ticker].write("{}, {}, {}, {}, {}\n".format(*ohlc))
                
print("Closing output files")
for _, handle in file_handles.items():
    handle.close()
                
  

Generating file list
Parsing available tickers and Opening output files
Processing 170120.txt
Parsing files
Processing 201019.txt
Processing 211019.txt
Processing 221019.txt
Processing 231019.txt
Processing 241019.txt
Processing 251019.txt
Processing 261019.txt
Processing 271019.txt
Processing 281019.txt
Processing 291019.txt
Processing 301019.txt
Processing 311019.txt
Processing 011119.txt
Processing 021119.txt
Processing 031119.txt
Processing 041119.txt
Processing 051119.txt
Processing 061119.txt
Processing 071119.txt
Processing 081119.txt
Processing 091119.txt
Processing 101119.txt
Processing 111119.txt
Processing 121119.txt
Processing 131119.txt
Processing 141119.txt
Processing 151119.txt
Processing 161119.txt
Processing 171119.txt
Processing 181119.txt
Processing 191119.txt
Processing 201119.txt
Processing 211119.txt
Processing 221119.txt
Processing 231119.txt
Processing 241119.txt
Processing 251119.txt
Processing 261119.txt
Processing 271119.txt
Processing 281119.txt
Processing 2