# Part IV - Optimization
## Project 4 - Crossword

[Course Link](https://cs50.harvard.edu/ai/)

[Project Instructions](https://cs50.harvard.edu/ai/projects/3/crossword/)

## Instructions
The purpose of this project is to generate a crossword puzzle by treating it as a constraint satisfaction problem. 

* Each sequence of squares (for one word) is **ONE VARIABLE**
* The **Domains** are each possible word given to put into the puzzle

We need to decide which word in the domain of all possible words can fill a given variable sequence. 

Each variable is defined by these values:
1. the row and col it begins on (i, j) respectively
2. the direction of the word (either DOWN or ACROSS)
3. the length of the word

<img src='images/ex1.png'>

Variable 1, for example, would be a variable represented by a row of 1 (assuming 0 indexed counting from the top), a column of 1 (also assuming 0 indexed counting from the left), a direction of across, and a length of 4.

Variables in crossword puzzles have both unary and binary constraints. 
The **Unary Constraint** of a crossword variable is given by its **length**, for variable 1 above, the value 'byte' would satisfy the unary constraint, but the value 'bit' would not. **Any values that don’t satisfy a variable’s unary constraints can therefore be removed from the variable’s domain immediately.**

The **binary constraints** on a variable are **its overlap with neighboring variables**. Variable 1 has a single neighbor: Variable 2. Variable 2 has two neighbors: Variable 1 and Variable 3. For each pair of neighboring variables, those variables share an overlap: a single square that is common to them both. We can represent that overlap as the character index (i,j) in each variable’s word that must be the same character.

For example, the overlap between Variable 1 and Variable 2 might be represented as the pair (1, 0), meaning that Variable 1’s character at index 1 necessarily must be the same as Variable 2’s character at index 0 (assuming 0-indexing, again). The overlap between Variable 2 and Variable 3 would therefore be represented as the pair (3, 1): character 3 of Variable 2’s value must be the same as character 1 of Variable 3’s value.

For this problem, we’ll add the additional constraint that **all words must be different: the same word should not be repeated multiple times in the puzzle**.

The challenge ahead, then, is write a program to find a satisfying assignment: a different word (from a given vocabulary list) for each variable such that all of the unary and binary constraints are met.

---
## Accompanying Files and Classes
There are two files associated with this project, crossword.py and generate.py. 

### crossword.py
Crossword.py was already set up and it conatins the Variable() and Crossword() classes. 

* A **Variable()** object represents the empty squares on a crossword puzzle and requires 4 parameters:
    1. **starting row (i)**
    2. **starting colum (j)**
    3. **direction (Across, Down)**
    4. **length (number of squares)**
    
    
* A  **Crossword()** object represents the actual crossword puzzle board, it requires two parameters:
    1. **structured_file:**  (a text file defining a puzzles structure, _ is used to denote blank cells, all other characters walls)
    
    2. **words file** - file that contains a list of words used for the vocabulary of the puzzle
    
For any crossword object, the following values are stored:
* **crossword.height** - is an integer representing the height of the crossword puzzle


* **crossword.width** - is an integer representing the width of the crossword puzzle


* **crossword.structure** - is a 2D list representing the structure of the puzzle. For any valid row i and column j, crossword.structure(i)(j) will be True if the cell is blank (a character must be filled there) and will be False otherwise (no character is to be filled in that cell)


* **crossword.words** - is a set of all of the words to draw from when constructing the crossword puzzle.


* **crossword.variables** - is a set of all of the variables in the puzzle (each is a Variable object)


* **crossword.overlaps** - is a dictionary mapping a pair of variables to their overlap. For any two distinct variables v1 and v2, crossword.overlaps[v1, v2] will be None if the two variables have no overlap, and will be a pair of integers (i, j) if the variables do overlap. The pair (i, j) should be interpreted to mean that the ith character of v1’s value must be the same as the jth character of v2’s value


* **neighbors()** - a method that returns all of the variables that overlap with a given variable. That is to say, crossword.neighbors(v1) will return a set of all of the variables that are neighbors to the variable v1

---
### generate.py
Contains a class **CrosswordCreator** that is used to solve a given crossword puzzle, its only parameter is a crossword object. Below are various parameters and methods contained within each CrosswordCreator object:

* **domains** - is a property that is a dictionary which maps variables to a set of all possible words the variable might take on as a value. Initially, the set of words will be the entire given vocabulary 


* **print()** - a method that prints (to the terminal) a representation of your crossword puzzle for a given assignment (every assignment, in this function and elsewhere, is a dictionary mapping variables to their corresponding words)


* **save()** - a method that generates an image file corresponding to a given assignment


* **letter_grid()** is a helper method used by both print and save that generates a 2D list of all characters in their appropriate positions for a given assignment


* **solve** - a method that calls 3 other functions:
    * **enforce_node_consistency()** - enforces node consistency on the crossword puzzle, ensuring that every value in a variable’s domain satisfy the unary constraints
    * **ac3()** - enforces arc consistency, which ensures that binary constraints are satisfied 
    * **backtrack()** - a method called initially on an empty assignment (the empty dict()) to try to calculate a solution to the problem
    
    
## Specifications
Complete the implementation of **enforce_node_consistency**, **revise**, **ac3**, **assignment_complete**, **consistent**, **order_domain_values**, **selected_unassigned_variable**, and **backtrack** in generate.py so that your AI generates complete crossword puzzles if it is possible to do so.

### Note: I put all functions from generate.py into this file and only import the crossword.py helpers


In [1]:
import sys
import queue
import pandas as pd
from crossword import *

class CrosswordCreator():
   
    def __init__(self, crossword):
        """
        Create new CSP crossword generate.
        """
        self.crossword = crossword
        self.domains = {
            var: self.crossword.words.copy()
            for var in self.crossword.variables
        }

        
    def letter_grid(self, assignment):
        """
        Return 2D array representing a given assignment.
        """
        letters = [
            [None for _ in range(self.crossword.width)]
            for _ in range(self.crossword.height)
        ]
        for variable, word in assignment.items():
            direction = variable.direction
            for k in range(len(word)):
                i = variable.i + (k if direction == Variable.DOWN else 0)
                j = variable.j + (k if direction == Variable.ACROSS else 0)
                letters[i][j] = word[k]
        return letters

    def print(self, assignment):
        """
        Print crossword assignment to the terminal.
        """
        letters = self.letter_grid(assignment)
        for i in range(self.crossword.height):
            for j in range(self.crossword.width):
                if self.crossword.structure[i][j]:
                    print(letters[i][j] or " ", end="")
                else:
                    print("█", end="")
            print()

            
    def save(self, assignment, filename):
        """
        Save crossword assignment to an image file.
        """
        from PIL import Image, ImageDraw, ImageFont
        cell_size = 100
        cell_border = 2
        interior_size = cell_size - 2 * cell_border
        letters = self.letter_grid(assignment)

        # Create a blank canvas
        img = Image.new(
            "RGBA",
            (self.crossword.width * cell_size,
             self.crossword.height * cell_size),
            "black"
        )
        font = ImageFont.truetype("assets/fonts/OpenSans-Regular.ttf", 80)
        draw = ImageDraw.Draw(img)

        for i in range(self.crossword.height):
            for j in range(self.crossword.width):

                rect = [
                    (j * cell_size + cell_border,
                     i * cell_size + cell_border),
                    ((j + 1) * cell_size - cell_border,
                     (i + 1) * cell_size - cell_border)
                ]
                if self.crossword.structure[i][j]:
                    draw.rectangle(rect, fill="white")
                    if letters[i][j]:
                        w, h = draw.textsize(letters[i][j], font=font)
                        draw.text(
                            (rect[0][0] + ((interior_size - w) / 2),
                             rect[0][1] + ((interior_size - h) / 2) - 10),
                            letters[i][j], fill="black", font=font
                        )

        img.save(filename)

        
    def solve(self):
        """
        Enforce node and arc consistency, and then solve the CSP.
        """
        self.enforce_node_consistency()
        self.ac3()
        
        assignment = {}
        for variable in self.crossword.variables:
            if len(self.domains[variable]) == 1:
                assignment.update({variable: list(self.domains[variable])[0]})
            else:  
                assignment.update({variable:None})
            
        return self.backtrack(assignment)

    
    def enforce_node_consistency(self):
        """
        Update `self.domains` such that each variable is node-consistent.
        (Remove any values that are inconsistent with a variable's unary
         constraints; in this case, the length of the word.)
        """
        for var, words in self.domains.items():
            new_domain = set()
            for word in words:
                if var.length == len(word):
                    new_domain.add(word) 
            self.domains[var] = new_domain
        print('Step 1: Unary Node Consistency Achieved!')

            
    def revise(self, x, y, pos):
        """
        Make variable `x` arc consistent with variable `y`.
        To do so, remove values from `self.domains[x]` for which there is no
        possible corresponding value for `y` in `self.domains[y]`.

        Return True if a revision was made to the domain of `x`; return
        False if no revision was made.
        """
        dx = list(self.domains[x])
        dy = list(self.domains[y])
        fails = set()
        
        # Character lists with letters that must match
        # between both lists (same indexes must contain same letters)
        l1 = [word[pos[0]] for word in dx]
        l2 = [word[pos[1]] for word in dy]

        fails = []
        
        for word in dx:
            # If only 1 word in x domain and that word is in y domain and there is no 
            # other possible word in y that meets the constraint, then x fails. 
            if len(dx) == 1 and word in dy and l2.count(word[pos[0]]) <= 1:
                fails.append(word)
                continue
            
            # If only 1 word in y domain and that word is in x domain then x fails.
            if len(dy) == 1 and dy[0] == word:
                fails.append(word)
                continue
            
            # Constraint Check
            if word[pos[0]] not in l2:
                fails.append(word)
                continue
         
        # Remove all words in fail from X.domain
        if len(fails) > 0:
            for word in fails:
                self.domains[x].remove(word)                    
            return True
        else:
            return False
          

    def ac3(self, arcs=None):
        """
        Update `self.domains` such that each variable is arc consistent.
        If `arcs` is None, begin with initial list of all arcs in the problem.
        Otherwise, use `arcs` as the initial list of arcs to make consistent.

        Return True if arc consistency is enforced and no domains are empty;
        return False if one or more domains end up empty.
        """       
        arcs = {}
        q = queue.Queue()
        
        # Get all unique arcs
        for k, v in self.crossword.overlaps.items():
            if v != None:
                # There are two arcs for each overlap, this section
                # removes one of them since both not necessary
                # note: v[::-1] reverses tuple
                if v in arcs.values() or v[::-1] in arcs.values():
                    pass
                else:
                    arcs.update({k:v})
        
        # Put all arcs into q 
        for k, v in arcs.items():
            q.put([k,v])
       
        while q.qsize() > 0:
            arc = q.get()
            X = arc[0][0]
            Y = arc[0][1]
            pos = arc[1]
            
            # If revision made to X Domain, then update queue with
            # arcs conatining all relevant neighbors of X (less than Y)
            # in order to maintain arc consistency throughout
            if self.revise(X,Y, pos):
                if len(self.domains[X]) == 0:
                    return False  # No way to solve the problem                
                for Z in self.crossword.neighbors(X) - {Y}:
                    overlaps = self.crossword.overlaps
                    k = (Z, X)
                    v = overlaps[(Z,X)]
                    q.put([k,v])            
                    
        print('Step 2: Binary Node (Arc) Consistency Achieved!')        
        return True
        
        
    def assignment_complete(self, assignment):
        """
        Return True if `assignment` is complete (i.e., assigns a value to each
        crossword variable); return False otherwise.
        """
        for k, v in assignment.items():
            if v == None:
                #print('Assignment Not Complete')
                return False 
        print('Step 3: Backtracking Complete!\n')
        print('\nAssignment Complete, Puzzle Solution:')
        print('-----------------------------------------------\n')
        return True
  
    
    def consistent(self, assignment, var, word):
        """
        Return True if `assignment` is consistent (i.e., words fit in crossword
        puzzle without conflicting characters); return False otherwise.
        """
        
        neighbors = list(self.crossword.neighbors(var))
        overlaps = self.crossword.overlaps
        
        for neighbor in neighbors:
            overlap = overlaps[(var,neighbor)]
            var_pos = overlap[0]
            n_pos = overlap[1] 
            
            # If neighbor already has value assigned, then
            # current word has to match neighbor word letter at overlap
            if assignment[neighbor] != None:
                neighbor_letter = assignment[neighbor][n_pos]
                if word[var_pos] != neighbor_letter:
                    return False
                
        return True
                
        
    def order_domain_values(self, var, assignment):
        """
        Return a list of values in the domain of `var`, in order by
        the number of values they rule out for neighboring variables.
        The first value in the list, for example, should be the one
        that rules out the fewest values among the neighbors of `var`.
        """
        neighbors = self.crossword.neighbors(var)
        num_neighbors = len(neighbors)
        var_dom = list(self.domains[var])
        return(list(self.domains[var]))
    

    def select_unassigned_variable(self, assignment):
        """
        Return an unassigned variable not already part of `assignment`.
        Choose the variable with the minimum number of remaining values
        in its domain. If there is a tie, choose the variable with the highest
        degree. If there is a tie, any of the tied variables are acceptable
        return values.
        """
        u_var = []
        for var, val in assignment.items():
            # if value == None then value not assigned so continue
            if val == None:

                # if no min var selected, add to m
                if len(u_var) == 0:
                    u_var.append(var)
               
                # if current dom length < m dom length, clear m and append current dom
                if len(self.domains[var]) < len(self.domains[u_var[0]]):
                    u_var[:] = []
                    u_var.append(var)
                
                # if current dom length == m dom length, then choose var with domain
                # that has largest words inside, if tie, doesn't matter so return m
                if len(self.domains[var]) == len(self.domains[u_var[0]]):
                    if var.length > u_var[0].length:
                        u_var[:] = []
                        u_var.append(var)
        return(u_var[0])
    

    def backtrack(self, assignment):
        """
        Using Backtracking Search, take as input a partial assignment for the
        crossword and return a complete assignment if possible to do so.
        `assignment` is a mapping from variables (keys) to words (values).
        If no assignment is possible, return None.
        """
        #print('\nASSIGNMENT SENT TO BACTRACK:')
        #print(assignment)
        if self.assignment_complete(assignment):
            return(assignment)
        
        var = self.select_unassigned_variable(assignment) 
        var_domain = self.order_domain_values(var, assignment)
                     
        for word in var_domain:
            if word in assignment.values():
                pass
            else:
                if self.consistent(assignment, var, word):
                    assignment[var] = word
                    result = self.backtrack(assignment)
                    if result != None:
                        return assignment
        return None
                    

def main(structure_file, word_file, png_name):
    structure = structure_file
    words = word_file
    output = png_name

    # Generate crossword
    crossword = Crossword(structure, words)
    creator = CrosswordCreator(crossword)
    assignment = creator.solve()
     
    # Print result
    if assignment is None:
        print("\nNo solution.")
    else:
        creator.print(assignment)
        if output:
            creator.save(assignment, output)

---
## Puzzle 1

In [5]:
main('data/structure0.txt', 'data/words0.txt', 'images/test.png')

Step 1: Unary Node Consistency Achieved!
Step 2: Binary Node (Arc) Consistency Achieved!
Step 3: Backtracking Complete!


Assignment Complete, Puzzle Solution:
-----------------------------------------------

█SIX█
█E██F
█V██I
█E██V
█NINE


---
## Puzzle 2

In [6]:
main('data/structure1.txt', 'data/words1.txt', 'images/test.png')

Step 1: Unary Node Consistency Achieved!
Step 2: Binary Node (Arc) Consistency Achieved!
Step 3: Backtracking Complete!


Assignment Complete, Puzzle Solution:
-----------------------------------------------

██████████████
███████M████R█
█INTELLIGENCE█
█N█████N████S█
█F██LOGIC███O█
█E█████M████L█
█R███SEARCH█V█
███████X████E█
██████████████


---
## Puzzle 3

In [7]:
main('data/structure2.txt', 'data/words2.txt', 'images/test.png')

Step 1: Unary Node Consistency Achieved!
Step 2: Binary Node (Arc) Consistency Achieved!
Step 3: Backtracking Complete!


Assignment Complete, Puzzle Solution:
-----------------------------------------------

██████N
SELF██O
P██ROOT
I██U██I
N██I██O
█BIT██N
