```{contents}
```

# TF (Term Frequency)

**Term Frequency (TF)** measures **how frequently a term (word) occurs in a document** relative to the total number of words in that document.

Mathematically:

$$
\text{TF}(t,d) = \frac{\text{Number of times term } t \text{ appears in document } d}{\text{Total number of terms in document } d}
$$

* $t$ → specific term (word)
* $d$ → specific document

It tells you **how important a word is in a document** based on its frequency.

---

### **2. Example**

Suppose we have a document:

```
Document: "I love NLP and I love Python"
```

**Step 1: Count each word**

| Word   | Count |
| ------ | ----- |
| I      | 2     |
| love   | 2     |
| NLP    | 1     |
| and    | 1     |
| Python | 1     |

**Step 2: Total words = 7**

**Step 3: Calculate TF**

$$
TF(\text{"I"}) = 2 / 7 \approx 0.285
$$

$$
TF(\text{"love"}) = 2 / 7 \approx 0.285
$$

$$
TF(\text{"NLP"}) = 1 / 7 \approx 0.142
$$

…and so on.

---

### **3. Key Points**

* TF only considers **frequency in a single document**.
* It **does not consider the importance across multiple documents** (that's where TF-IDF comes in).
* Common words like "the", "is", "and" usually have high TF but may not be important semantically.

In [1]:
from sklearn.feature_extraction.text import CountVectorizer

# Sample document
doc = ["I love NLP and I love Python"]

# Initialize CountVectorizer (this calculates raw term counts)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(doc)

# Feature names
print("Vocabulary:", vectorizer.get_feature_names_out())

# Term counts
print("Term counts:", X.toarray())

# Term Frequency (manually calculating)
import numpy as np
tf = X.toarray()[0] / np.sum(X.toarray()[0])
print("Term Frequency (TF):", tf)


Vocabulary: ['and' 'love' 'nlp' 'python']
Term counts: [[1 2 1 1]]
Term Frequency (TF): [0.2 0.4 0.2 0.2]


In [2]:
from sklearn.feature_extraction.text import CountVectorizer

# Sample document
doc = ["I love NLP and I love Python"]

# Initialize CountVectorizer (this calculates raw term counts)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(doc)

# Feature names
print("Vocabulary:", vectorizer.get_feature_names_out())

# Term counts
print("Term counts:", X.toarray())

# Term Frequency (manually calculating)
import numpy as np
tf = X.toarray()[0] / np.sum(X.toarray()[0])
print("Term Frequency (TF):", tf)


Vocabulary: ['and' 'love' 'nlp' 'python']
Term counts: [[1 2 1 1]]
Term Frequency (TF): [0.2 0.4 0.2 0.2]
