# Author Info
---
Name: **Ejaz-ur-Rehman**\
Business Unit Head | Data Analyst\
MBA (Accounting & Finance), MS (Finance)\
Crystal Tech (Project of MUZHAB Group)\
Karachi, Pakistan

![Date](https://img.shields.io/badge/Date-26--Aug--2025-green?logo=google-calendar)
[![Email](https://img.shields.io/badge/Email-ijazfinance%40gmail.com-blue?logo=gmail)](mailto:ijazfinance@gmail.com)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Ejaz--ur--Rehman-blue?logo=linkedin)](https://www.linkedin.com/in/ejaz-ur-rehman/)
[![GitHub](https://img.shields.io/badge/GitHub-ejazurrehman-black?logo=github)](https://github.com/ejazurrehman)

# Factors and Multiples in Data Science
- factors and multiples seem like basic school-level math (e.g., factors of 12 are 1, 2, 3, 4, 6, 12). But in data science, these concepts play useful roles in algorithms, optimization, cryptography, and computation efficiency.
---
### 1. Cryptography & Data Security (Number Theory Application):
- Modern encryption methods (RSA, ECC) rely on the difficulty of factoring very large numbers.
- Factors: Hard to compute for huge numbers → forms the basis of encryption strength.
- Multiples: Used in modular arithmetic for encryption/decryption.
- Example: RSA encryption works because factoring a product of two large primes is computationally difficult.
### 2. Data Partitioning & Batch Processing:
- When dividing large datasets for parallel processing, factors help split data into equal parts.
- Multiples help in deciding batch sizes or window sizes for data processing.
- Example: If you have 1,000,000 rows, choosing a batch size of 2,500 (a factor) makes it easier to divide the dataset evenly across processors.
### 3. Hashing & Indexing in Databases:
- Hash functions often use modulo operations (related to multiples and factors).
- A prime factor as the modulo base improves data distribution in hash tables.
- Example: A hash table with size 101 (a prime number) avoids clustering and distributes data more evenly than size 100.
### 4. Fourier Transform & Signal Processing:
- Data science (especially in image/audio analysis) uses Fourier transforms.
- Factors of sequence length (e.g., powers of 2) determine how efficiently FFT (Fast Fourier Transform) runs.
- Example: An FFT runs fastest when the data length is a multiple of 2 (like 256, 512, 1024).
### 5. Optimization Problems:
- Some data science problems (scheduling, supply chain, resource allocation) use Least Common Multiple (LCM) or Greatest Common Factor (GCF) for optimization.
- Example: If two machines run on cycles of 12 minutes and 18 minutes, LCM = 36 → tells when both will be free together, useful in production scheduling.
### 6. Random Number Generation & Sampling:
- Many random number generators rely on modulus arithmetic.
- Choosing good factors/multiples prevents repetitive cycles.
- Example: Linear Congruential Generator (LCG).

## 1. Data Partitioning Using Factors:
- Imagine we have a dataset with 1,200,000 records and we want to split it evenly across GPUs or processors.

In [1]:
import math

# total records
records = 1200000

# find factors of the dataset size
def find_factors(n):
    return [i for i in range(1, n+1) if n % i == 0]

factors = find_factors(records)

print("Factors of 1,200,000 (possible equal splits):")
print(factors[:20], "...")  # print first 20 factors


Factors of 1,200,000 (possible equal splits):
[1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 16, 20, 24, 25, 30, 32, 40, 48, 50, 60] ...


**Use case:** You might pick 24, 48, or 96 as batch sizes (since they divide evenly into 1,200,000).