# Get food names from NLTK

Gets a list of food names from NLTK wordnet and saves to file.

* Function to leverage NLTK word names - https://stackoverflow.com/questions/57057039/how-to-extract-all-words-in-a-noun-food-category-in-wordnet
* Downloading a list of many different words - https://raw.githubusercontent.com/AllenDowney/ThinkPython2/master/code/words.txt 

In [1]:
# Download list of words
import requests
import os

def download_file(url, filename):
    try:
        if not os.path.exists(filename):
            with open(filename, "wb") as f:
                response = requests.get(url, stream=True)
                response.raise_for_status() 
                # Write data to file
                for block in response.iter_content(chunk_size=4096):
                    f.write(block)
    except:
        print(f"Exiting... {filename} already exists.")


In [2]:
download_file("https://raw.githubusercontent.com/AllenDowney/ThinkPython2/master/code/words.txt",
    filename="data/words.txt"
)

## Filter words for foods and not foods

Using code from Stack Overflow to filter for words that are nouns and *are* foods.

The following function is adapted from: https://stackoverflow.com/a/57060039/7900723

And: https://github.com/mrdbourke/food-not-food/issues/2

In [3]:
# Using the NLTK WordNet dictionary check if the word is noun and a food.
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

[nltk_data] Downloading package wordnet to /home/daniel/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [4]:
def food_not_food(word):
    """Returns 0 or 1 whether a string is a food noun or not.

    Args:
        word ([str]): target string to decide if it's a food word or not.

    Returns:
        [int]: 0 for not_food and 1 for food.

    Example:
        food_not_food("banana")
        >>> 1
    """
    syns = wn.synsets(str(word), pos=wn.NOUN) 
    # List of forbidden noun types
    forbidden = ["artifact", "object", "animal", "person"]
    for syn in syns:
        current_lex = syn.lexname().split(".")[-1]
        # Check to see if lexname is in list of forbidden,
        # if it is, return 0 for not_food.
        if current_lex in forbidden:
            return 0
        else:
            if "food" in syn.lexname():
                return 1
    return 0

food_not_food("banana")

1

In [5]:
syns = wn.synsets(str("torpedoes"), pos=wn.NOUN)
for syn in syns:
    print(syn, syn.lexname())

Synset('gunman.n.01') noun.person
Synset('bomber.n.03') noun.food
Synset('torpedo.n.03') noun.artifact
Synset('torpedo.n.04') noun.artifact
Synset('torpedo.n.05') noun.artifact
Synset('torpedo.n.06') noun.artifact
Synset('electric_ray.n.01') noun.animal


In [6]:
# Import list of words
import pandas as pd
word_df = pd.read_csv("data/words.txt", header=None)
word_df.columns = ["word"]
word_df.head()

Unnamed: 0,word
0,aa
1,aah
2,aahed
3,aahing
4,aahs


In [7]:
# Create column if the word is a food and a noun
word_df["is_food"] = word_df["word"].apply(food_not_food)
word_df.head()

Unnamed: 0,word,is_food
0,aa,0
1,aah,0
2,aahed,0
3,aahing,0
4,aahs,0


In [8]:
# How many foods and non-foods are there?
word_df["is_food"].value_counts()

0    111715
1      2068
Name: is_food, dtype: int64

In [9]:
# View random samples of foods
import random
random.sample(word_df[word_df["is_food"] == 1]["word"].tolist(), k=10)

['lichis',
 'potato',
 'dietaries',
 'hamburgers',
 'concoction',
 'meatballs',
 'goody',
 'souchong',
 'acetum',
 'raspberry']

In [10]:
# Export list of foods and non-foods to file
food_list = word_df[word_df["is_food"]==1]["word"].tolist()
non_food_list = word_df[word_df["is_food"]==0]["word"].tolist()
food_list[:10]

['absinth',
 'absinthe',
 'absinthes',
 'absinths',
 'acerola',
 'acerolas',
 'acetum',
 'afters',
 'ail',
 'ails']

In [11]:
# Write lists to file
def write_to_file(targ_file, targ_list):
    with open(targ_file, "w") as f:
        for element in targ_list:
            f.write(str(element) + "\n")

In [12]:
write_to_file(targ_file="data/food_list.txt", targ_list=food_list)
write_to_file(targ_file="data/non_food_list.txt", targ_list=non_food_list)

This link may help looking into finding more food and non-food names (could find all food names that are related to the verb "eat"): https://stackoverflow.com/questions/30081982/get-noun-from-verb-wordnet