![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Checking Notebook Spelling

This notebook logs possible spelling errors in the notebooks in the current folder and any subfolders.

First we'll create a DataFrame, `files_df` of all of the notebook files.

In [None]:
import os
import json
import pandas as pd
from spellchecker import SpellChecker
spell = SpellChecker()

file_df = pd.DataFrame()
for root, dirs, files in os.walk("."):
    for filename in files:
        if filename.endswith('.ipynb'):
            if not 'checkpoint' in filename:
                file = os.path.join(root, filename)
                file_df = file_df.append({'File':file},ignore_index=True)


Then we'll loop through that `file_df` DataFrame and log possible spelling errors to a new DataFrame called `corrections`.

This takes a highly non-trivial amount of time. You may want to break it into chunks with `file_df.iloc[0:50].iterrows()`.

In [None]:
corrections = pd.DataFrame(columns=['File','Paragraph','Word','Suggestion'])
for i, row in file_df.iterrows():
    file = row['File']
    print(i, file)
    notebook = json.load(open(file))
    for cell in notebook['cells']:
        if cell['cell_type'] == 'markdown':
            if cell['source'][0][:20] != '![Callysto.ca Banner' and cell['source'][0][:22]!='[![Callysto.ca License':
                for paragraph in cell['source']:
                    for word in spell.split_words(paragraph):
                        suggestion = spell.correction(word)
                        if word != suggestion:
                            corrections = corrections.append({'File':file,'Paragraph':paragraph,'Word':word,'Suggestion':suggestion},ignore_index=True)

Then make a list of all of the files that require corrections.

In [None]:
file_list = corrections['File'].unique()
len(file_list)

Once we start to look at the suggested corrections, though, we'll realize that there are many we should ignore. Let's create a new DataFrame, `new_corrections`, that doesn't include those words.

As we come across new words that we don't want to change, we can add them to this list and re-run the cell.

In [None]:
good_words = ['applet', 'numpy', 'other words that are actually correct']
new_corrections = corrections
for gw in good_words:
    new_corrections = new_corrections[new_corrections['Word'] != gw]

Now for the somewhat tedious part, displaying the words in a notebook that should to be corrected. This requires openning that notebook and making the corrections, then displaying corrections for the next notebook, `file_list[1]`, and repeating.

As we find words that we don't want to correct, we can add them to `good_words` above so we don't need to keep scrolling past them.

If there are more that 50 lines then they won't all display, use `[:49]` to display the first 50, then `[50:99]` to display the next 50.

In [None]:
new_corrections[new_corrections['File']==file_list[0]]#[50:99]

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)