# **`MAKEMORE PART2 TURKISH`**

In [1]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline  

In [2]:
import csv

with open('turkce_isim.csv', 'r') as f:
    reader = csv.reader(f)
    names = [row[0] for row in reader]
    names.pop(0)

print(names[:8])

['aba', 'abaca', 'abacan', 'abaç', 'abay', 'abayhan', 'abaza', 'abbas']


In [3]:
words = names
words[:5]

['aba', 'abaca', 'abacan', 'abaç', 'abay']

In [4]:
turkish_sort = ['a', 'b', 'c', 'ç', 'd', 'e', 'f', 'g', 'ğ', 'h', 'ı', 'i', 'j', 'k', 'l', 'm', 'n', 'o','ö', 'p', 'r', 's', 'ş', 't', 'u','ü', 'v', 'w', 'x', 'y', 'z']
turkish_sort.insert(0,'.')
turkish_sort,len(turkish_sort)
stoi = {ch:i for i,ch in enumerate(turkish_sort)}
itos = {i: st for st, i in stoi.items()}
print(stoi)
print(itos)

{'.': 0, 'a': 1, 'b': 2, 'c': 3, 'ç': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8, 'ğ': 9, 'h': 10, 'ı': 11, 'i': 12, 'j': 13, 'k': 14, 'l': 15, 'm': 16, 'n': 17, 'o': 18, 'ö': 19, 'p': 20, 'r': 21, 's': 22, 'ş': 23, 't': 24, 'u': 25, 'ü': 26, 'v': 27, 'w': 28, 'x': 29, 'y': 30, 'z': 31}
{0: '.', 1: 'a', 2: 'b', 3: 'c', 4: 'ç', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'ğ', 10: 'h', 11: 'ı', 12: 'i', 13: 'j', 14: 'k', 15: 'l', 16: 'm', 17: 'n', 18: 'o', 19: 'ö', 20: 'p', 21: 'r', 22: 's', 23: 'ş', 24: 't', 25: 'u', 26: 'ü', 27: 'v', 28: 'w', 29: 'x', 30: 'y', 31: 'z'}


## **`Dataset`:**
**xx**

In [5]:
block_size = 3
context = [0]*block_size
X, Y = [], []
for w in words[8:14]:
    context = [0]*block_size
    print(w)
    for le in w + '.':        
        ix= stoi[le]
        X.append(context)
        Y.append(ix)
        print(''.join(itos[i] for i in context), '===>', itos[ix])
        context = context[1:] + [ix]
X = torch.tensor(X)
Y = torch.tensor(Y)

abdal
... ===> a
..a ===> b
.ab ===> d
abd ===> a
bda ===> l
dal ===> .
abdi
... ===> a
..a ===> b
.ab ===> d
abd ===> i
bdi ===> .
abdullah
... ===> a
..a ===> b
.ab ===> d
abd ===> u
bdu ===> l
dul ===> l
ull ===> a
lla ===> h
lah ===> .
abdurrahman
... ===> a
..a ===> b
.ab ===> d
abd ===> u
bdu ===> r
dur ===> r
urr ===> a
rra ===> h
rah ===> m
ahm ===> a
hma ===> n
man ===> .
abdülalim
... ===> a
..a ===> b
.ab ===> d
abd ===> ü
bdü ===> l
dül ===> a
üla ===> l
lal ===> i
ali ===> m
lim ===> .
abdülazim
... ===> a
..a ===> b
.ab ===> d
abd ===> ü
bdü ===> l
dül ===> a
üla ===> z
laz ===> i
azi ===> m
zim ===> .


In [6]:
X.shape,X.dtype, Y.shape, Y.dtype

(torch.Size([52, 3]), torch.int64, torch.Size([52]), torch.int64)

### **`Important`:**
**What I find important about the second part is embedding. In the paper BENGIO at al. used 30 dimensional space for each word. Here Andrej uses 2 dimensional space for a letter. Which was not available in the previous example**

### **`Embeddings`:**
**Just two floating number for and integer(index of a character.)***

In [7]:
C = torch.randn(len(turkish_sort), 2)
C.shape

torch.Size([32, 2])

### **`One hot encoding`:**
**understanding the effect of one hot vector turing matrix multiplication.Simply only `one` in the vector effects the matrix multiplication.**
```python
F.one_hot(torch.tensor(5), num_classes=len(turkish_sort)).float() @ C
```
**just plucks out the 5th index from the `C` lockup tensor**


In [12]:
F.one_hot(torch.tensor(5), num_classes=len(turkish_sort)).float() @ C

tensor([1.0016, 1.1534])

In [15]:
# for getting more than one index from C then it is possible
# to use list of the indexes to the one_hot function
# the result will be embedded vectors of the indexes
one_hot = F.one_hot(torch.tensor([5, 6]), num_classes=len(turkish_sort)).float() @ C
one_hot


tensor([[ 1.0016,  1.1534],
        [ 0.4804, -0.7848]])

In [16]:
# the shape represents at the 0th index the number of embeddings
# and at the 1st index the number of the dimensions of the embedding
one_hot.shape

torch.Size([2, 2])

In [22]:
#As you put your training examples to the lookup table you will get the embeddings of the examples 
C[X].shape, C[X][:3]

(torch.Size([52, 3, 2]),
 tensor([[[ 0.4213, -1.5737],
          [ 0.4213, -1.5737],
          [ 0.4213, -1.5737]],
 
         [[ 0.4213, -1.5737],
          [ 0.4213, -1.5737],
          [-0.2250, -0.0521]],
 
         [[ 0.4213, -1.5737],
          [-0.2250, -0.0521],
          [-1.0109, -0.7784]]]))

In [23]:
# six is for the 3 letter and 2 embeddings
# 100 is arbitrary number of the hidden layer
# b1 is the bias for the hidden layer
W1 = torch.randn(6,100)
b1 = torch.randn(100)

### **`This section look like a mini pytorch course like in the previous lecture mentioning torch.sum() `:**
#### **Understand the `cat`(concatenate) `unbind`, `view` operations:**
**x**


In [26]:
emb = C[X]
emb.shape

torch.Size([52, 3, 2])

In [57]:
x = torch.randn(1, 2)
x

tensor([[-0.6225,  0.8961]])

#### **Understand the `cat`(concatenate):**
**Pretty straighhforward dimension is the key, '0' means as rows and '1' means as columns if there were two dimensions.**


In [59]:
torch.cat((x,x,x),0)

tensor([[-0.6225,  0.8961],
        [-0.6225,  0.8961],
        [-0.6225,  0.8961]])

#### **Understand the `unbind`:**
**x**

In [67]:
x3 = torch.cat((x,x,x),0)
x3

tensor([[-0.6225,  0.8961],
        [-0.6225,  0.8961],
        [-0.6225,  0.8961]])

In [69]:
torch.unbind(x3,0)

(tensor([-0.6225,  0.8961]),
 tensor([-0.6225,  0.8961]),
 tensor([-0.6225,  0.8961]))

#### **Understand the `view`:**
**x**

In [83]:
# first three characters of a names looks like this for two names:

first_three_of_first  = torch.randn(1,3,2)

first_three_of_second  = torch.randn(1,3,2)

together = torch.cat((first_three_of_first,first_three_of_second),0)
together



tensor([[[ 0.2810,  0.2375],
         [-0.0231, -0.7654],
         [-0.3987,  0.9318]],

        [[-2.1862,  0.1347],
         [ 0.1889, -0.2003],
         [-1.7675, -0.5961]]])

In [88]:
#need to conver them to a row.But without needing the first dimension which is 2 that represent different names.
together.view(-1,6)

tensor([[ 0.2810,  0.2375, -0.0231, -0.7654, -0.3987,  0.9318],
        [-2.1862,  0.1347,  0.1889, -0.2003, -1.7675, -0.5961]])

### **`Important: how bias vector added`:**
**This is how all of the activations added not the same number but the same vector. I guess each number on the vector added to the consecutive neuron activation** 

### **`grad=None`:**
**During the forward pass, as far as I understand we do not record gradient because before backward pass we just make it None and as far as I understand we do not use it. I guess gradien formed hust after loss is calculated. Hope to see it during the backprop lecture** 