# Analyzing Python Programs That Use PyTamaro: An AST Approach


## Abstract

This project explores the use of Abstract Syntax Trees (ASTs) to analyze Python programs that utilize the PyTamaro library. PyTamaro is an educational library that provides a simple interface for creating and composing graphics.

From the ASTs of Python programs that use PyTamaro, we can extract information about the usage of the library. In particular, we are going to extract all the functions, primitives, constants and operations that are used in the programs. Then, we are going to map each of these elements to the corresponding TamaroCards.

TamaroCards are a set of cards that represent all the elements of the PyTamaro library. They are used in the educational context to teach programming concepts and the use of the PyTamaro library in a unplugged way.

The goal of this project is to provide a tool that can be used by teachers to analyze any given Python program that uses PyTamaro and automatically generate printable sets of the necessary TamaroCards.

The PyTamaro library is available in the following languages:

- English
- Italian
- German
- French

This creates the first challenge we need to address...

## How can we analyse Python programs that use PyTamaro in different languages?

If in one source code I find a call to the function `kreis_sektor` that comes from the `pytamaro.de` package, how can I know that this function is actually the same as the `settore_circolare` function in the `pytamaro.it` package?

The solution is to create a mapping between the names present in the PyTamaro library from the different languages to a common one.

This simplifies the analysis of the various user programs later on, as we will be able to get programs written in any of the PyTamaro supported languages and map them to the common language.

The common language we are going to use is English, which is the base language of the PyTamaro library.

We could manually create this mapping, but this would be a tedious and error-prone task. Instead, we are going to select the set of "wrapper" files, which in the PyTamaro library are used to expose it's functionalities in the different languages.

The list of this files is:

- color_names
- color
- graphic
- io
- operations
- point_names
- point
- primitives

Which are present in all the different language versions:

- English
- Italian
- German
- French

In these files, there are already mappings between the original English versions and the target language, this is done by wrapping all the functions and re-defining all the constants while maintaining the reference to the original English version.

Now we need a way to extract this information from the PyTamaro library and use it for our analysis.

### Extracting the mapping from the PyTamaro library

To extract the mapping from the PyTamaro library we are going to create an AST version of all the wrapper files, then we are going to extract all the functions, constants and operations that are present as nodes in the AST and save it in a dictionary that can be used later on.

We can organize the various steps that are required to create the translation dictionary in the following way:

1. Load the various wrapper files from the PyTamaro library
2. Create an AST version of the combined wrapper files
3. Extract all the functions, constants and operations from the AST
4. Save the extracted information in a dictionary

#### Loading the wrapper files and creating the AST

We are going to use the `ast` python package to create the AST representation of the wrapper files. The `ast` module provides a simple way to create and manipulate the abstract syntax trees of any Python program.

In [2]:
import ast
import inspect
import importlib

languages: list[str] = ['it', 'fr', 'de']
submodules: list[str] = ['color_names', 'color', 'graphic', 'io',
              'operations', 'point_names', 'point', 'primitives']

source_code: str = ''

for lang in languages:
    for submodule in submodules:
        # Generate the module path based on the language and the submodule
        module_path = f'pytamaro.{lang}.{submodule}'
        try:
            # Dynamically import the module
            module = importlib.import_module(module_path)
            # Get the source code and append it to the source_code string
            source_code += inspect.getsource(module)
        except Exception as e:
            print(f'Error with {module_path}: {e}')

# Generate the ast version of the source code
pytamaro_ast = ast.parse(source_code)

#### NodeVisitor class for extracting the functions, constants and operations

Now that we have the AST representation of the wrapper files, we can create a custom `NodeVisitor` class that will extract all the functions, constants and operations from the AST. This class will be used to traverse the AST and at each node, in which we are interested in, we are going to add the corresponding PyTamaro element to the `translations` dictionary.

This dictionary has the form: `translations["original_name"] = "translated_name"`

In [3]:
class PyTamaroTranslatorVisitor(ast.NodeVisitor):
    def __init__(self: ast.NodeVisitor) -> None:
        self.translations: dict[str, str] = {}

    def is_pytamaro_word(self, word: str) -> bool:
        return word in self.translations

    def translate(self: ast.NodeVisitor, name: str) -> str:
        return self.translations.get(name, name)

    def visit_AnnAssign(self, node: ast.AnnAssign) -> None:
        self.translations[node.target.id] = node.value.attr
        self.translations[node.value.attr] = node.value.attr

    def visit_Assign(self, node: ast.Assign) -> None:
        self.translations[node.targets[0].id] = node.value.attr
        self.translations[node.value.attr] = node.value.attr

    def visit_FunctionDef(self: ast.NodeVisitor, node: ast.FunctionDef) -> None:
        # Get the last statement
        # This leverages the fact that in pytamaro, the last statement is always the return statement
        # or a function call (expression)
        statement = node.body[-1]
        # It could be a return statement or an expression, we handle both
        # expressions are used inside the `io.py` file of the various languages
        if isinstance(statement, ast.Return):
            self.translations[node.name] = statement.value.func.attr
            self.translations[statement.value.func.attr] = statement.value.func.attr
        elif isinstance(statement, ast.Expr):
            self.translations[node.name] = statement.value.func.attr
            self.translations[statement.value.func.attr] = statement.value.func.attr

We can now use our `PyTamaroTranslatorVisitor` class to extract the language mapping from the PyTamaro library, and then take a look at the resulting dictionary.

In [4]:
import json

# Create a new instance of the NodeVisitor
translator_visitor = PyTamaroTranslatorVisitor()

# Visit the ast
translator_visitor.visit(pytamaro_ast)

# Print the translations
print(json.dumps(translator_visitor.translations, indent=4))

{
    "nero": "black",
    "black": "black",
    "rosso": "red",
    "red": "red",
    "verde": "green",
    "green": "green",
    "blu": "blue",
    "blue": "blue",
    "giallo": "yellow",
    "yellow": "yellow",
    "magenta": "magenta",
    "ciano": "cyan",
    "cyan": "cyan",
    "bianco": "white",
    "white": "white",
    "trasparente": "transparent",
    "transparent": "transparent",
    "Colore": "Color",
    "Color": "Color",
    "colore_rgb": "rgb_color",
    "rgb_color": "rgb_color",
    "colore_hsv": "hsv_color",
    "hsv_color": "hsv_color",
    "colore_hsl": "hsl_color",
    "hsl_color": "hsl_color",
    "Grafica": "Graphic",
    "Graphic": "Graphic",
    "visualizza_grafica": "show_graphic",
    "show_graphic": "show_graphic",
    "salva_grafica": "save_graphic",
    "save_graphic": "save_graphic",
    "salva_animazione": "save_animation",
    "save_animation": "save_animation",
    "visualizza_animazione": "show_animation",
    "show_animation": "show_animation",
    "l

We also included two utility functions:
- `translate` that allows us to translate a single term to the corresponding English term.
  - Returns the translated term if it is present in the dictionary.
  - If the term is not present in the dictionary, it returns the original term.
- `is_pytamaro_word` that allows us to check if a given term is present in the dictionary.
  - Returns `True` if the term is present, `False` otherwise.

In [5]:
print(translator_visitor.translate('settore_circolare'))
print(translator_visitor.translate('haut_centre'))
print(translator_visitor.translate('zeige_grafik'))
print(translator_visitor.translate('asdasdasdasd'))

print()

print(translator_visitor.is_pytamaro_word('settore_circolare'))
print(translator_visitor.is_pytamaro_word('fubar'))

circular_sector
top_center
show_graphic
asdasdasdasd

True
False


## Analyzing Python programs

Now that we have a way of translating the various PyTamaro terms to English, we can start working on the analysis of the Python programs.

We are going to use again the `ast` module and create a custom `NodeVisitor` class that will extract various information from the AST of the Python programs.

We are going to gather all the terms that are present in the PyTamaro library and the various operators and builtin Python functions that are used in the programs.

The information we are going to extract is:
- Function definitions and calls
- Constants definitions and usage
- Binary, Unary and Boolean operators
- If-else statements

In [6]:
import builtins


class UserProgramVisitor(ast.NodeVisitor):
    def __init__(self) -> None:
        self.tamaro_cards: dict[str, int] = {}
        self.user_defined_functions: list[str] = []
        self.pytamaro_python_used_functions: set[str] = set()
        self.excluded_names: set[str] = set()
        self._collected_user_functions: bool = False

    def _is_pytamaro_type(self, name: str) -> bool:
        return name[0].isupper()

    def _add_tamaro_card(self, name: str) -> None:
        if name not in self.excluded_names:
            self.tamaro_cards[name] = self.tamaro_cards.get(name, 0) + 1

    def _collect_user_defined_functions(self, node: ast.AST) -> None:
        for child in ast.walk(node):
            if isinstance(child, ast.FunctionDef):
                self.user_defined_functions.append(child.name)
                for arg in child.args.args:
                    self.excluded_names.add(arg.arg)
                    if isinstance(arg.annotation, ast.Name):
                        self.excluded_names.add(arg.annotation.id)
                    elif isinstance(arg.annotation, ast.Subscript):
                        self.excluded_names.add(arg.annotation.value.id)

    def visit(self, node: ast.AST) -> None:
        if not self._collected_user_functions:
            self._collect_user_defined_functions(node)
            self._collected_user_functions = True
        super().visit(node)

    def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
        self._add_tamaro_card("function-def")
        self.user_defined_functions.append(node.name)
        for arg in node.args.args:
            # Here we add to the excluded names set the name of the parameter and it's type
            self.excluded_names.add(arg.arg)
            if isinstance(arg.annotation, ast.Name):
                self.excluded_names.add(arg.annotation.id)
            elif isinstance(arg.annotation, ast.Subscript):
                self.excluded_names.add(arg.annotation.value.id)
        super().generic_visit(node)

    def visit_Assign(self, node: ast.Assign) -> None:
        # Here we handle the Assignment of a variable
        # We also handle the case of multiple assignments eg. a, b = 1, 2
        for target in node.targets:
            if isinstance(target, ast.Name):
                self._add_tamaro_card("constant-def")
                self.excluded_names.add(target.id)
            elif isinstance(target, ast.Tuple):
                for elt in target.elts:
                    self._add_tamaro_card("constant-def")
                    self.excluded_names.add(elt.id)
        super().generic_visit(node)

    def _is_standard_library_function(self, func_name: str) -> bool:
        if func_name in dir(builtins):
            return True
        try:
            module_name = func_name.split(".")[0]
            module = importlib.import_module(module_name)
            func = eval(f"module.{func_name.split('.')[1]}")
            return inspect.ismodule(module) and inspect.isfunction(func)
        except (ImportError, AttributeError, IndexError):
            return False

    def visit_Call(self, node: ast.Call) -> None:
        # Here we handle the Call of a function
        # We check if the function is user defined or not, to choose the corresponding card
        # Between the generic USE-function# or the function specific card
        if isinstance(node.func, ast.Name):
            if node.func.id in self.user_defined_functions:
                n_args = len(node.args) if len(node.args) <= 3 else 3
                self._add_tamaro_card(f"function-use{n_args}")
                self.excluded_names.add(node.func.id)
            else:
                if translator_visitor.is_pytamaro_word(
                    node.func.id
                ) or self._is_standard_library_function(node.func.id):
                    translated_name = translator_visitor.translate(node.func.id)
                    self._add_tamaro_card(translated_name)
                    self.pytamaro_python_used_functions.add(translated_name)
        super().generic_visit(node)

    def visit_Constant(self, node: ast.Constant) -> None:
        self._add_tamaro_card("constant-use")
        super().generic_visit(node)

    def visit_Name(self, node: ast.Name) -> None:
        translated_name = translator_visitor.translate(node.id)
        if (
            translated_name not in self.pytamaro_python_used_functions
            and not self._is_pytamaro_type(translated_name)
            and (
                translator_visitor.is_pytamaro_word(translated_name)
                or self._is_standard_library_function(translated_name)
            )
        ):
            self._add_tamaro_card(translated_name)
        super().generic_visit(node)

    def visit_For(self, node: ast.For) -> None:
        if isinstance(node.target, ast.Name):
            self.excluded_names.add(node.target.id)
        super().generic_visit(node)

    """
    ---Operators
    """

    def _generic_operator_visit(self, node: ast.AST, to_visit: ast.AST) -> None:
        operator = node.__class__.__name__.lower()
        self._add_tamaro_card(operator)
        super().generic_visit(to_visit)

    def visit_BinOp(self, node: ast.BinOp) -> None:
        self._generic_operator_visit(node.op, node)

    def visit_UnaryOp(self, node: ast.UnaryOp) -> None:
        self._generic_operator_visit(node.op, node)

    def visit_BoolOp(self, node: ast.BoolOp) -> None:
        self._generic_operator_visit(node.op, node)

    def visit_Compare(self, node: ast.Compare) -> None:
        for op in node.ops:
            self._generic_operator_visit(op, node)

    def visit_IfExp(self, node: ast.IfExp) -> None:
        self._generic_operator_visit(node, node)

### Loading of the example Python programs
We are going to load some example Python programs that use the PyTamaro library. These programs are written in the different supported languages and are going to be used to test our analysis tool.

In [7]:
examples_folder = "example_codes/"

pacman_it = ""
with open(examples_folder + "pacman_it.py", "r") as f:
    pacman_it = f.read()

pacman_en = ""
with open(examples_folder + "pacman_en.py", "r") as f:
    pacman_en = f.read()

heart_en = ""
with open(examples_folder + "heart_en.py", "r") as f:
    heart_en = f.read()

### Testing the PyTamaroTranslatorVisitor
Now we can pass the AST version of the example programs to the `PyTamaroTranslatorVisitor` and get as output the set of cards that are used in the programs.

In [8]:
user_program_ast = ast.parse(pacman_it)

user_program_visitor = UserProgramVisitor()

# This will first collect all the user defined functions
# and then collect all the other nodes
user_program_visitor.visit(user_program_ast)

print(
    json.dumps(
        user_program_visitor.tamaro_cards,
        indent=4,
    )
)

{
    "function-def": 1,
    "constant-def": 1,
    "pow": 1,
    "constant-use": 6,
    "rotate": 1,
    "div": 1,
    "circular_sector": 1,
    "sub": 1,
    "yellow": 1,
    "show_graphic": 1,
    "function-use1": 1
}


## Stiching together the TamaroCards

Now that we have the set of cards that are used in a given Python program, we can combine together in a printable easy-to-cut format.

### Renaming of the files to create a mapping between the TamaroCards and the Dictionary

We have the set of available cards inside the `cards` folder. If the folder has been just downloaded from the PyTamaro website, we first need to do a bit of renaming to make the cards compatible with our analysis tool.

In [9]:
import os

names_mapping = {
    "plus": "add",
    "divide": "div",
    "equal": "eq",
    "integer-divide": "floordiv",
    "greater-than": "gt",
    "greater-or-equal": "gte",
    "if-else": "ifexp",
    "less-than": "lt",
    "less-or-equal": "lte",
    "remainder": "mod",
    "times": "mult",
    "not-equal": "noteq",
    "power": "pow",
    "minus": "sub",
    "unary-plus": "uadd",
    "unary-minus": "usub",
}

# Folder containing the cards
cards_folder = "cards"

# rename all the files inside the folder 'cards' with their mathing name
# and remove the files that are not .svg
for root, dirs, files in os.walk(cards_folder):
    for file in files:
        # just the files ending in .svg
        if file.endswith(".svg"):
            if file.split(".")[0] in names_mapping:
                new_name = names_mapping.get(file.split(".")[0], file.split(".")[0])
                os.rename(
                    os.path.join(root, file), os.path.join(root, f"{new_name}.svg")
                )
                print(f"Renamed {file} to {new_name}.svg")
        else:
            # remove the files that are not .svg
            os.remove(os.path.join(root, file))
            print(f"Removed {file}")

### Creating the printable cards pages

Now we are ready to create the printable pages by combining the cards that are used in the programs.

We create a utility function that takes as input the dictionary of cards, generated from the `UserProgramVisitor`, and the folder containing the cards. This function `get_tamaro_svgs` gathers all the needed svgs and adds them into a list.

We sort the list of svgs with the `order_svgs` function to better generate the printable pages.

Finally, we use the `create_svg_stack` function which takes the list of svgs, and a bunch of other parameters, to create the printable pages.

The cards are arranged in various pages, and in each page we try to align the cards in a way that they can be easily cut.

In [10]:
import svg_stack as ss
from lxml import etree
import os


def get_tamaro_svgs(
    tamaro_cards: dict[str, int], src: str = "cards"
) -> list[etree._ElementTree]:
    # cards could be in subfolders
    svgs = []
    for root, dirs, files in os.walk(src):
        for file in files:
            if file.split(".")[0] in tamaro_cards:
                for _ in range(tamaro_cards[file.split(".")[0]]):
                    svg = etree.parse(os.path.join(root, file))
                    svgs.append(svg)
    return svgs


def order_svgs(svgs: list[etree._ElementTree]) -> list[etree._ElementTree]:
    # Order the svgs based on the height and width
    return sorted(
        svgs,
        key=lambda svg: (svg.getroot().attrib["width"], svg.getroot().attrib["height"]),
    )


def create_svg_stack(
    svgs: list[etree._ElementTree],
    output_folder: str = ".",
    scale: float = 1.0,
    h_padding: float = 5.0,
    v_padding: float = 5.0,
) -> None:
    import os

    A4_WIDTH = 1200
    A4_HEIGHT = 800

    page_number = 0
    svg = ss.Document()
    page_layout = ss.VBoxLayout()
    row_layout = ss.HBoxLayout()
    page_layout.setSpacing(v_padding)
    row_layout.setSpacing(h_padding)

    for card in svgs:
        # Create a new svg that is resized by the scale factor
        root = card.getroot()
        # get the last 2 values of the viewBox attribute
        viewBox = root.attrib["viewBox"].split(" ")

        svg_width = float(viewBox[2]) * scale
        root.attrib["width"] = str(svg_width)
        svg_height = float(viewBox[3]) * scale
        root.attrib["height"] = str(svg_height)
        # Save the modified svg in a temp file
        svg_string = etree.tostring(root).decode()
        with open("__temp.svg", "w") as f:
            f.write(svg_string)

        # Check if the card fits in the row
        # if not add the row to the page layout
        # if the page layout is full save the page and create a new one
        if row_layout.get_size().width + svg_width + h_padding > A4_WIDTH:
            page_layout.addLayout(row_layout)
            row_layout = ss.HBoxLayout()
            row_layout.setSpacing(h_padding)
        if page_layout.get_size().height + svg_height + v_padding > A4_HEIGHT:
            svg.setLayout(page_layout)
            svg.save(f"{output_folder}/tamaroCards_{page_number}.svg")
            page_layout = ss.VBoxLayout()
            page_layout.setSpacing(v_padding)
            page_number += 1

        row_layout.addSVG("__temp.svg", alignment=ss.AlignHCenter | ss.AlignVCenter)
        # Remove the temp file
        os.remove("__temp.svg")

    if row_layout.get_size().width > 0:
        page_layout.addLayout(row_layout)
    if page_layout.get_size().height > 0:
        svg.setLayout(page_layout)
        svg.save(f"{output_folder}/tamaroCards_{page_number}.svg")


svgs = get_tamaro_svgs(user_program_visitor.tamaro_cards)
sorted_svgs = order_svgs(svgs)
# if the folder 'output' does not exist create it
if not os.path.exists("output"):
    os.makedirs("output")

create_svg_stack(sorted_svgs, output_folder="output", scale=0.22, v_padding=3)

---

## Students code analysis

### Loading the students code

In [11]:
import os, ast

analysis: dict[int, str] = {}
source_code_dir = "students-code"
# get number of folders inside students-code
n_students_code = len(os.listdir(source_code_dir))
general_program_visitor = UserProgramVisitor()

for root, dirs, files in os.walk(source_code_dir):
    for index, dir_name in enumerate(dirs):
        user_code = ""
        for file in os.listdir(os.path.join(root, dir_name)):
            if file.endswith(".py") and file == "cell.py":
                with open(os.path.join(root, dir_name, file), "r") as f:
                    temp_code = f.read()
                    if temp_code != "" and temp_code != "pass":
                        user_code += temp_code + "\n\n"
        student_id: int = int(dir_name)
        analysis[student_id] = {}
        analysis[student_id]["code"] = user_code
        try:
            temp_user_program_ast = ast.parse(user_code)
            analysis[student_id]["error"] = None
            temp_user_program_visitor = UserProgramVisitor()
            temp_user_program_visitor.visit(temp_user_program_ast)
            general_program_visitor.visit(temp_user_program_ast)
            analysis[student_id]["tamaro_cards"] = temp_user_program_visitor.tamaro_cards
            analysis[student_id]["user_defined_functions"] = temp_user_program_visitor.user_defined_functions
            analysis[student_id]["pytamaro_python_used_functions"] = temp_user_program_visitor.pytamaro_python_used_functions
        except Exception as e:
            analysis[student_id]["error"] = str(e)
        if index % 10000 == 0:
            print(f"Processed {index + 1}/{n_students_code} students")

Processed 1/359389 students
Processed 10001/359389 students
Processed 20001/359389 students
Processed 30001/359389 students
Processed 40001/359389 students
Processed 50001/359389 students
Processed 60001/359389 students




Processed 70001/359389 students
Processed 80001/359389 students




Processed 90001/359389 students
Processed 100001/359389 students
Processed 110001/359389 students
Processed 120001/359389 students
Processed 130001/359389 students
Processed 140001/359389 students
Processed 150001/359389 students
Processed 160001/359389 students
Processed 170001/359389 students
Processed 180001/359389 students
Processed 190001/359389 students
Processed 200001/359389 students
Processed 210001/359389 students
Processed 220001/359389 students
Processed 230001/359389 students
Processed 240001/359389 students
Processed 250001/359389 students
Processed 260001/359389 students
Processed 270001/359389 students
Processed 280001/359389 students




Processed 290001/359389 students
Processed 300001/359389 students




Processed 310001/359389 students
Processed 320001/359389 students
Processed 330001/359389 students
Processed 340001/359389 students
Processed 350001/359389 students


### Exporting the analysis dictionary to a JSON file

In [12]:
# convert the pytamaro_python_used_functions to a list foreach student
for student_id, student_analysis in analysis.items():
    if "pytamaro_python_used_functions" in student_analysis:
        student_analysis["pytamaro_python_used_functions"] = list(student_analysis["pytamaro_python_used_functions"])


# export the resulting analysis to a json file
with open("analysis.json", "w") as f:
    json.dump(analysis, f, indent=4)


### Error analysis

In [13]:
# count all the errors
errors: dict[int, str] = {}
for student_id, data in analysis.items():
    if data["error"] != None:
        errors[student_id] = data["error"]

print(f"Errors: {len(errors)}/{n_students_code}")

Errors: 38722/359389


In [None]:
# for student_id, error in errors.items():
#     print(f"{student_id}: {error}")

### Used cards analysis

In [14]:
sorted_general_cards = dict(
    sorted(
        general_program_visitor.tamaro_cards.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)

print(len(sorted_general_cards))

print(
    json.dumps(
        sorted_general_cards,
        indent=4,
    )
)



111
{
    "constant-use": 2461732,
    "constant-def": 1523718,
    "function-use3": 500479,
    "function-use2": 460882,
    "function-use1": 323358,
    "function-def": 199609,
    "mult": 191310,
    "range": 58898,
    "sub": 37873,
    "floordiv": 35698,
    "usub": 24350,
    "div": 16914,
    "function-use0": 15905,
    "hsv_color": 15626,
    "eq": 13942,
    "len": 8914,
    "add": 8156,
    "mod": 7589,
    "lte": 6511,
    "compose": 6420,
    "gte": 5496,
    "print": 4161,
    "gt": 3981,
    "rgb_color": 3722,
    "lt": 3363,
    "show_animation": 3200,
    "bottom_right": 2891,
    "and": 2827,
    "pow": 2301,
    "empty_graphic": 2293,
    "hsl_color": 2070,
    "overlay": 2027,
    "circular_sector": 1886,
    "bottom_center": 1252,
    "ellipse": 1200,
    "pin": 855,
    "ifexp": 760,
    "top_right": 728,
    "top_center": 718,
    "transparent": 628,
    "enumerate": 621,
    "save_animation": 589,
    "or": 516,
    "graphic_width": 440,
    "top_left": 371,
    

### Used Fonts Analysis

In [15]:
import builtins
from collections import Counter
import ast

class FontVisitor(ast.NodeVisitor):
    def __init__(self) -> None:
        self.fonts: dict[str, int] = {}
    
    def visit_Call(self, node: ast.Call) -> None:
        # Here we handle the Call of a function
        # We check if the function is user defined or not, to choose the corresponding card
        # Between the generic USE-function# or the function specific card
        if isinstance(node.func, ast.Name):
            translated_name = translator_visitor.translate(node.func.id)
            if translated_name == "text":
                if len(node.args) == 4:
                    if "value" in node.args[1].__dict__:
                        key = node.args[1].value
                    else:
                        key = "N/A"
                    self.fonts[key] = self.fonts.get(key, 0) + 1
        super().generic_visit(node)


# font_ast = ast.parse(test_font_code)
# font_program_visitor = FontVisitor()
# # This will first collect all the user defined functions, and then visit the ast
# font_program_visitor.visit(font_ast)

# print(font_program_visitor.fonts)

In [16]:
# font
import os, ast

font_analysis = {}
source_code_dir = "students-code"
# get number of folders inside students-code
n_students_code = len(os.listdir(source_code_dir))
general_program_visitor = FontVisitor()

for root, dirs, files in os.walk(source_code_dir):
    for index, dir_name in enumerate(dirs):
        user_code = ""
        for file in os.listdir(os.path.join(root, dir_name)):
            if file.endswith(".py") and file == "cell.py":
                with open(os.path.join(root, dir_name, file), "r") as f:
                    temp_code = f.read()
                    if temp_code != "" and temp_code != "pass":
                        user_code += temp_code + "\n\n"
        student_id: int = int(dir_name)
        font_analysis[student_id] = {}
        font_analysis[student_id]["code"] = user_code
        try:
            temp_user_program_ast = ast.parse(user_code)
            font_analysis[student_id]["error"] = None
            temp_user_program_visitor = FontVisitor()
            temp_user_program_visitor.visit(temp_user_program_ast)
            general_program_visitor.visit(temp_user_program_ast)
            font_analysis[student_id]["fonts"] = temp_user_program_visitor.fonts
        except Exception as e:
            font_analysis[student_id]["error"] = str(e)
        if index % 10000 == 0:
            print(f"Processed {index + 1}/{n_students_code} students")

Processed 1/359389 students
Processed 10001/359389 students
Processed 20001/359389 students
Processed 30001/359389 students
Processed 40001/359389 students
Processed 50001/359389 students
Processed 60001/359389 students




Processed 70001/359389 students
Processed 80001/359389 students




Processed 90001/359389 students
Processed 100001/359389 students
Processed 110001/359389 students
Processed 120001/359389 students
Processed 130001/359389 students
Processed 140001/359389 students
Processed 150001/359389 students
Processed 160001/359389 students
Processed 170001/359389 students
Processed 180001/359389 students
Processed 190001/359389 students
Processed 200001/359389 students
Processed 210001/359389 students
Processed 220001/359389 students
Processed 230001/359389 students
Processed 240001/359389 students
Processed 250001/359389 students
Processed 260001/359389 students
Processed 270001/359389 students
Processed 280001/359389 students




Processed 290001/359389 students
Processed 300001/359389 students




Processed 310001/359389 students
Processed 320001/359389 students
Processed 330001/359389 students
Processed 340001/359389 students
Processed 350001/359389 students


In [20]:
sorted_general_fonts = dict(
    sorted(
        general_program_visitor.fonts.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)

# if the key is not a string, we put N/A
filtered_general_fonts = {key if isinstance(key, str) else "N/A": value for key, value in sorted_general_fonts.items()}

print(
    json.dumps(
        filtered_general_fonts,
        indent=4,
    )
)

{
    "arial": 3434,
    "Arial": 915,
    "Roboto": 866,
    "N/A": 1,
    "roboto": 407,
    "Fira Sans": 398,
    "Fira Code": 375,
    "Libre Bodoni": 304,
    "verdana": 158,
    "Calibri": 84,
    "": 65,
    "helvetica": 47,
    "bell mt": 40,
    "pt serif": 33,
    "calibri": 28,
    "font": 27,
    "ariel": 25,
    "sans serif": 24,
    "roboto sherif": 23,
    "sans-serif": 19,
    "Jost": 17,
    "Helvetica": 16,
    "Sans Serif": 16,
    "ARIAL": 13,
    "Aptos": 12,
    "Sans Code": 11,
    "Roboto Mono": 10,
    "Courier": 8,
    "FiraSans": 8,
    "DejaVu Sans": 8,
    "Roboto Condensed": 8,
    "cormorant garamond": 8,
    "Verdana": 8,
    "Bookman old style": 8,
    "Papyrus": 8,
    "DejaVu Serif": 6,
    "Inter": 6,
    "robot": 5,
    "DejaVu Sans Mono": 5,
    "Nigel Nagel neues Nagellack": 5,
    "IBM Plex Sans": 5,
    "PT Serif": 5,
    "Roboto Serif": 4,
    "hallo": 4,
    "!li!i!li!i!li!": 4,
    "nero": 4,
    "gruen": 4,
    "Noto Sans": 3,
    "Font": 3,