# Unicode to ASCII converter

If you're working with legacy tools that only support ASCII, you have to convert accented Latin characters (or your entire alphabet, if it's not Latin).

This notebook uses the [Unidecode](https://pypi.org/project/Unidecode/) Python package to strip diacritics, and do a basic character-to-character transliteration (i.e. it's probably best to use something else for non-western alphabets). It **overwrites** all text files in a specified directory with ASCII versions, and then optionally prepends "ascii-" to the start of the filenames, so be sure you have a copy of the original files saved elsewhere before you run this.

## 1. Setup
In this code block, you need to specify the full path to the folder where you've stored the files you want to convert under *sourcefiledirectory*.

In [16]:
#os is used for things like changing directories and listing files
import os

#glob provides utilities here
import glob

#unidecode is used for changing unicode accented characters to an equivalent unaccented version
import unidecode

#io is used for opening and writing files
import io

#This is the full path to the directory where you've stored the source texts
sourcefiledirectory = '/Users/qad/Documents/texts-to-convert'

#Changes the directory to where you've stored the source texts, so you can open them in later code blocks
os.chdir(sourcefiledirectory)

## 2. Convert Unicode to ASCII
This code block converts the Unicode characters to ASCII equivalents, and overwrites the source files using the ["Western" (ISO-8859-1) encoding](https://en.wikipedia.org/wiki/ISO/IEC_8859-1) typically used for ASCII files.

In [17]:
#Look through the directory you specified to find files that end in .txt.
for filename in os.listdir(sourcefiledirectory):
    if filename.endswith(".txt"):
        #For each file that ends in .txt, open and read its contents into a string.
        f = open(filename, 'r')
        text = f.read()
        #Replace accented characters with unacceted equivalents
        lines = unidecode.unidecode(text)

        #Create a new file with the same file name (i.e. replacing the original file) and write the modified lines
        #This method also automatically closes the file once it's done
        with io.open(filename, "w", encoding="ISO-8859-1") as out:
            out.writelines(lines)      
            

## 3. Rename files
Optional-- you can run this code block to prepend *ascii_* to the filenames of the text files you've just converted. Or if the original filenames are fine, you can skip this.

In [18]:

for filename in os.listdir(sourcefiledirectory):
    if filename.endswith(".txt"):
        new_filename = "ascii_" + filename
        os.rename(filename,new_filename)