# A fast introduction to Machine Learning



> Machine learning is the field of study that gives computers the ability
to learn without being explicitly programmed. Arthur Samuel, 1959

Learning:
> A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Tom Mitchell’s, 1997
 
Machine learning is a branch of artificial intelligence that allows computers to learn from given information and perform new but similar tasks. 



```{figure} figs/AI-ML.png
---
width: 80%
align: center
name: transistor per micro procesors
---
```


## Artificial intelligence
The computer performs actions defined as requiring intelligence -> Moving target
- Search Based Heuristic Optimization
- Evolutionary computation
- Logic Programming (inductive logic programming, fuzzy logic)
- robabilistic Reasoning Under Uncertainty (bayesian networks)
- Computer Vision
- Natural Language Processing
- Robotics
- Machine Learning

Examples:
- Self-driving cars
- ChatGPT (LLM): Chatgpt, gemini, claude, deepseek, qwen, ... (Inner working: <https://www.youtube.com/watch?v=NKnZYvZA7w4>)
- Healthcare: Diagnosis from scans
- Finance: Fraud detection
- Retail: Recommender systems
- Transport: Autonomous vehicles
- Creativity: AI art, music, writing


<!-- <div style="text-align: center;">
    <img src="figs/AI-ML.png" alt="Machine learning as a subarea of artificail intelligence. From: Understanding Deep Learning, Simon J.D. Prince" width="600">
    <figcaption>From: Understanding Deep Learning, Simon J.D. Prince</figcaption>
</div> -->

# Historical development and projections
- https://letsdatascience.com/learn/history/history-of-machine-learning/
- https://github.com/microsoft/ML-For-Beginners/blob/main/1-Introduction/2-history-of-ML/README.md
- https://www.inveniam.fr/a-brief-history-of-machine-learning
- https://ahistoryofai.com/

## Major Developments in Machine Learning (XX & XXI Centuries)

| Year | Development / Algorithm | Key Researchers | Seminal Work & Reference Link |
| :--- | :--- | :--- | :--- |
| **1943** | **Artificial Neuron**<br>(McCulloch-Pitts Neuron) | Warren McCulloch,<br>Walter Pitts | *A Logical Calculus of the Ideas Immanent in Nervous Activity*<br>[**[PDF/Link]**](https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCulloch.and.Pitts.pdf) |
| **1950** | **Turing Test**<br>(Foundations of AI) | Alan Turing | *Computing Machinery and Intelligence*<br>[**[PDF/Link]**](https://academic.oup.com/mind/article/LIX/236/433/986238) |
| **1957** | **The Perceptron**<br>(Single-layer Neural Network) | Frank Rosenblatt | *The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain*<br>[**[PDF/Link]**](https://psycnet.apa.org/doi/10.1037/h0042519) |
| **1986** | **Backpropagation**<br>(Popularization for Multi-Layer Perceptrons) | D. Rumelhart, G. Hinton,<br>R. Williams | *Learning representations by back-propagating errors*<br>[**[Nature Link]**](https://www.nature.com/articles/323533a0) |
| **1989** | **Convolutional Neural Networks (CNN)**<br>(LeNet predecessor) | Yann LeCun et al. | *Backpropagation Applied to Handwritten Zip Code Recognition*<br>[**[PDF/Link]**](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) |
| **1995** | **Support Vector Machines (SVM)** | Corinna Cortes,<br>Vladimir Vapnik | *Support-Vector Networks*<br>[**[Springer Link]**](https://link.springer.com/article/10.1007/BF00994018) |
| **1997** | **Long Short-Term Memory (LSTM)**<br>(RNN Architecture) | Sepp Hochreiter,<br>Jürgen Schmidhuber | *Long Short-Term Memory*<br>[**[MIT Press Link]**](https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory) |
| **2001** | **Random Forests** | Leo Breiman | *Random Forests*<br>[**[Springer Link]**](https://link.springer.com/article/10.1023/A:1010933404324) |
| **2006** | **Deep Belief Networks**<br>(Rebirth of Deep Learning) | Geoffrey Hinton et al. | *A Fast Learning Algorithm for Deep Belief Nets*<br>[**[PDF/Link]**](https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf) |
| **2012** | **AlexNet**<br>(Deep Learning on ImageNet) | A. Krizhevsky, I. Sutskever,<br>G. Hinton | *ImageNet Classification with Deep Convolutional Neural Networks*<br>[**[NeurIPS Link]**](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) |
| **2013** | **Word2Vec**<br>(Efficient Word Embeddings) | Tomas Mikolov et al. | *Efficient Estimation of Word Representations in Vector Space*<br>[**[ArXiv Link]**](https://arxiv.org/abs/1301.3781) |
| **2014** | **Generative Adversarial Networks (GANs)** | Ian Goodfellow et al. | *Generative Adversarial Networks*<br>[**[NeurIPS Link]**](https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf) |
| **2015** | **ResNet**<br>(Residual Learning / Very Deep Networks) | Kaiming He et al. | *Deep Residual Learning for Image Recognition*<br>[**[CVPR/ArXiv Link]**](https://arxiv.org/abs/1512.03385) |
| **2016** | **AlphaGo**<br>(Deep RL / MCTS) | David Silver et al. (DeepMind) | *Mastering the game of Go with deep neural networks and tree search*<br>[**[Nature Link]**](https://www.nature.com/articles/nature16961) |
| **2017** | **Transformer Architecture**<br>(Self-Attention Mechanism) | Ashish Vaswani et al. | *Attention Is All You Need*<br>[**[NeurIPS Link]**](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) |
| **2018** | **BERT**<br>(Bidirectional Encoder Representations) | Jacob Devlin et al. | *BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding*<br>[**[ArXiv Link]**](https://arxiv.org/abs/1810.04805) |
| **2020** | **GPT-3**<br>(Large Language Models / Few-Shot Learning) | Tom Brown et al. (OpenAI) | *Language Models are Few-Shot Learners*<br>[**[NeurIPS Link]**](https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf) |
| **2020** | **Diffusion Models**<br>(Foundation for Modern Image Gen) | Jonathan Ho et al. | *Denoising Diffusion Probabilistic Models*<br>[**[NeurIPS Link]**](https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf) |
| **2021** | **AlphaFold 2**<br>(Protein Structure Prediction) | John Jumper et al. (DeepMind) | *Highly accurate protein structure prediction with AlphaFold*<br>[**[Nature Link]**](https://www.nature.com/articles/s41586-021-03819-2) |
| **2021** | **LoRA**<br>(Low-Rank Adaptation / Efficient Fine-tuning) | Edward Hu et al. (Microsoft) | *LoRA: Low-Rank Adaptation of Large Language Models*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2106.09685) |
| **2022** | **Chain-of-Thought (CoT)**<br>(Prompting for Reasoning) | Jason Wei et al. (Google) | *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2201.11903) |
| **2022** | **ReAct**<br>(Foundations of AI Agents) | Shunyu Yao et al. (Princeton/Google) | *ReAct: Synergizing Reasoning and Acting in Language Models*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2210.03629) |
| **2023** | **GPT-4**<br>(Multimodal Large Language Models) | OpenAI | *GPT-4 Technical Report*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2303.08774) |
| **2023** | **Llama 2 / Open Weights**<br>(Democratization of LLMs) | Hugo Touvron et al. (Meta AI) | *Llama 2: Open Foundation and Fine-Tuned Chat Models*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2307.09288) |
| **2023** | **DPO**<br>(RLHF Alternative) | Rafael Rafailov et al. (Stanford) | *Direct Preference Optimization: Your Language Model is Secretly a Reward Model*<br>[**[NeurIPS Link]**](https://arxiv.org/abs/2305.18290) |
| **2024** | **Mixture of Experts (MoE)**<br>(Efficient Scaling / Mixtral) | Albert Q. Jiang et al. (Mistral AI) | *Mixtral of Experts*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2401.04088) |
| **2024** | **Long-Context Multimodal**<br>(1M+ Token Context / Gemini 1.5) | Gemini Team (Google) | *Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2403.05530) |
| **2024** | **Reasoning Models (System 2)**<br>(OpenAI o1 / Hidden Chain of Thought) | OpenAI | *OpenAI o1 System Card*<br>[**[OpenAI Report]**](https://openai.com/index/openai-o1-system-card/) |
| **2025** | **DeepSeek-V3**<br>(Hyper-Efficient Training / MLA) | DeepSeek-AI | *DeepSeek-V3 Technical Report*<br>[**[ArXiv Link]**](https://arxiv.org/abs/2412.19437) |




## The Timeline
### **Phase 1: The Golden Age (1950s–1970)**
*   **Algorithms:** Turing Test, Perceptrons (Rosenblatt), Logic Theorist.
*   **Vibe:** Unbridled optimism. "Machines will do everything humans can do within 20 years."
*   **The Crash:** Minsky & Papert published *Perceptrons* (1969), mathematically proving single-layer networks couldn't solve simple problems like XOR. Funding dried up.

### **❄️ AI Winter 1 (1974–1980)**
*   **Cause:** Overpromising and underdelivering.
*   **Result:** Government funding (DARPA/UK) slashed. "Neural Networks" became a taboo term.

### **Phase 2: The Knowledge Era (1980–1987)**
*   **Algorithms:** Expert Systems (rule-based), Backpropagation popularization (Hinton/Rumelhart).
*   **Vibe:** AI moved from "learning" to "rules." Companies bought expensive Lisp Machines.
*   **The Crash:** Lisp machines were too expensive and brittle. Desktop PCs (IBM/Apple) became cheaper and faster. The market for specialized AI hardware collapsed.

### **❄️ AI Winter 2 (1987–1993)**
*   **Cause:** Commercial failure of Expert Systems.
*   **Result:** Researchers fled the field or rebranded. The term "Machine Learning" started being used to distance the field from "AI."

### **Phase 3: The Statistical "Quiet" Era (1995–2010)**
*   **Algorithms:** SVMs (Support Vector Machines), Random Forests.
*   **Vibe:** Neural networks were considered "unreliable" and "black boxes." Mathematics and statistics ruled. SVMs were the gold standard.
*   **Hardware context:** Moore's law was working, but CPUs were still serial processors. Training deep nets was theoretically possible but computationally impossible.

### **Phase 4: The Deep Learning Explosion (2012–Present)**
*   **The Catalyst:** **ImageNet 2012 (AlexNet).**
*   **The Hardware Key:** Researchers realized that **GPUs** (Graphics Processing Units), originally designed for video games, were perfect for the matrix math required by Neural Networks.
*   **Development:**
    *   **2006:** NVIDIA releases **CUDA** (making GPUs programmable).
    *   **2012:** AlexNet destroys the competition using GPUs.
    *   **2017:** Google introduces the **Transformer** (Attention mechanism), enabling massive parallelization on hardware.
    *   **2020+:** Custom silicon (**TPUs**, A100s, H100s) allows for training models with trillions of parameters (GPT-4).

---




In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline

# Set up the figure
plt.figure(figsize=(14, 8))

# Data points to simulate the "Hype/Progress" curve
years = [1950, 1960, 1970, 1974, 1980, 1985, 1988, 1993, 2000, 2010, 2012, 2015, 2018, 2024]
hype_levels = [5, 40, 25, 10, 15, 50, 40, 15, 30, 45, 65, 80, 90, 100]

# Smooth the curve
X_Y_Spline = make_interp_spline(years, hype_levels)
X_ = np.linspace(min(years), max(years), 500)
Y_ = X_Y_Spline(X_)
Y_ = np.clip(Y_, 0, 100) # Keep within bounds

# Plot the main line
plt.plot(X_, Y_, color='#1f77b4', linewidth=3, label='AI Capabilities/Hype')

# --- Add AI Winters (Shaded Regions) ---
plt.axvspan(1974, 1980, color='gray', alpha=0.3, label='1st AI Winter (funding cuts)')
plt.axvspan(1987, 1993, color='gray', alpha=0.3, label='2nd AI Winter (Lisp market crash)')

# --- Key Annotations ---

# 1. Early Era
plt.annotate('Perceptrons', xy=(1960, 40), xytext=(1955, 55),
             arrowprops=dict(facecolor='black', shrink=0.05))

# 2. The Second Wave
plt.annotate('Backpropagation\n& Expert Systems', xy=(1985, 50), xytext=(1980, 70),
             arrowprops=dict(facecolor='black', shrink=0.05))

# 3. The Quiet Statistical Era
plt.annotate('SVMs & Random Forests\n(Neural Nets unpopular)', xy=(2000, 30), xytext=(1995, 10),
             arrowprops=dict(facecolor='black', shrink=0.05))

# 4. The Hardware/DL Explosion
plt.annotate('AlexNet (2012)\nDeep Learning Breakthrough', xy=(2012, 65), xytext=(2005, 80),
             arrowprops=dict(facecolor='red', shrink=0.05))

plt.annotate('Transformers / LLMs', xy=(2020, 95), xytext=(2015, 105),
             arrowprops=dict(facecolor='red', shrink=0.05))

# --- Hardware Overlay ---
plt.text(2013, 50, "GPU Revolution\n(NVIDIA CUDA)", color='red', fontsize=10, weight='bold')
plt.text(2018, 60, "TPUs & Massive Compute", color='red', fontsize=10, weight='bold')

# Styling
plt.title('Evolution of Machine Learning: Winters & Hardware Explosions', fontsize=16)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Relative Progress / Hype', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='upper left')

# Show
plt.tight_layout()
plt.show()


## Hardware impact 
Hardware was too costly, but improvements on both cpu and gpu made it practical, or at least attainable, to apply the different AI models.

```{figure} figs/transistors-per-microprocessor.png
---
width: 90%
align: center
name: 
---

```
<!-- 
<img src="figs/transistors-per-microprocessor.png" alt="transistor versus microprocesos" width="60%" align="center" /> -->

```{figure} https://epochai.org/assets/images/posts/2022/gpu-perf/gpu-perf-banner.png
---
width: 90 %
align: center
name: 
---

```

<!-- <img src="https://epochai.org/assets/images/posts/2022/gpu-perf/gpu-perf-banner.png" alt="gpu-perf over time" style="width: 80%;"/> -->

Is this the future? <https://ai-2027.com/>
```{figure} https://ai-2027.com/_next/image?url=%2F_next%2Fstatic%2Fmedia%2FepochLLMprice-nowatermark.824fa343.png&w=1920&q=75
---
width: 90 %
align: center
name: 
---

```


Price inflation and the "ai bubble"

```{figure} figs/nvidia.png
---
width: 90%
align: center
name: 
---

```

https://investor.nvidia.com/stock-info/stock-quote-and-chart/default.aspx

```{figure} figs/ailoop.png
---
width: 80 %
align: center
name: 
---

```


---

**New arquitectures- TPU (googles big bet):**
- https://considerthebulldog.com/tte-tpu/
- https://jax-ml.github.io/scaling-book/tpus/
- https://henryhmko.github.io/posts/tpu/tpu.html

<img src="https://epoch.ai/assets/images/posts/2023/trends-in-machine-learning-hardware/trends-in-machine-learning-hardware-banner.png" alt="ML hardware prices" width="90%" align="center">

Source: https://epoch.ai/blog/trends-in-machine-learning-hardware

---


<img src="https://epoch.ai/assets/images/posts/2024/how-much-does-it-cost-to-train-frontier-ai-models/how-much-does-it-cost-to-train-frontier-ai-models-banner.png" alt="cost per training" width="90%" align="center">

Source: https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models

Check more data: https://epoch.ai/data/machine-learning-hardware

## Impact on software engineering (and some white collar jobs)
```{figure} figs/softwarejobs.png
---
width: 90%
align: center
name: 
---
Source: <https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE#>

```

```{figure} figs/codejanitor.png
---
width: 80%
align: center
name: 
---
```


```{figure} figs/officejobs.png
---
width: 90 %
align: center
name: 
---
Source: <https://fred.stlouisfed.org/series/LNU02032205#>
```

See: <https://claude.com/blog/cowork-research-preview>

# ML in science
<https://trends.google.com/explore?q=ml%2520physics&date=2004-01-01%202025-12-26&geo=Worldwide>

<https://www.quantamagazine.org/series/science-in-the-age-of-ai/>

- NN - Hopfield: <https://www.quantamagazine.org/the-strange-physics-that-gave-birth-to-ai-20250430/>
- https://www.quantamagazine.org/an-idea-from-physics-helps-ai-see-in-higher-dimensions-20200109/
- Diffusion models: https://www.quantamagazine.org/the-physics-principle-that-inspired-modern-ai-art-20230105/
  
- https://www.quantamagazine.org/where-do-scientists-think-this-is-all-going-20250430/

Virtual assistants:
- <https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/>
- <https://arxiv.org/abs/2502.18864>

In [None]:
from IPython.display import IFrame
file_url = "https://drive.google.com/file/d/1NKW1ntAVym0F9nHijnIV8MAESnEvfU0K/preview"
IFrame(file_url, width=640, height=480)
# Source: https://www.reddit.com/r/interesting/comments/1qr1mxb/evolution_of_ai/

## Applications to Scientific Discovery

"AI is becoming a microscope for data: it helps scientists see patterns and predictions that were previously invisible."


Machine Learning and AI are accelerating breakthroughs in scientific research by helping scientists extract patterns from massive datasets, automate complex processes, and even generate new hypotheses.

| Field             | ML/AI Application                                                                   |
| ----------------- | ----------------------------------------------------------------------------------- |
| Astronomy         | Classifying galaxies from telescope data; finding exoplanets (e.g., Kepler mission) |
| Physics           | Simulating particle collisions (e.g., CERN), anomaly detection in LHC data          |
| Biology           | Protein folding prediction (e.g., AlphaFold), gene expression analysis              |
| Chemistry         | Drug discovery by molecular property prediction                                     |
| Climate Science   | Modeling weather and climate patterns; detecting extreme events                     |
| Materials Science | Discovering new materials using generative models and property prediction           |
| Neuroscience      | Brain activity decoding from EEG/fMRI signals                                       |

**AlphaFold** by DeepMind predicts the 3D structure of proteins from amino acid sequences with remarkable accuracy — solving a 50-year grand challenge in biology.
- <https://deepmind.google/science/alphafold/>
- https://www.youtube.com/watch?v=gg7WjuFs8F4


**Google Co-Scientist**
- <https://blog.google/feed/google-research-ai-co-scientist/>
- <https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/>


## Where to Find Data in the Basic and Natural Sciences

Finding high-quality, publicly available data is a crucial skill for any scientist. Data is often stored in repositories, which can be general-purpose or field-specific. The articles mentioned previously likely sourced their data from a mix of their own experiments and public databases like these.
General-Purpose Scientific Data Repositories

These platforms host datasets from a wide variety of scientific fields. They are often used when a field-specific repository doesn't exist or as a requirement for publication in many journals.

- Zenodo: (<https://zenodo.org/>)

A general-purpose repository operated by CERN. It accepts data from all fields of science and provides a DOI for every upload, making the data citable.

- Figshare: (<https://figshare.com/>)

A platform where researchers can make all of their research outputs available in a citable, shareable, and discoverable manner. It hosts figures, datasets, posters, and code.

- Dryad: (<https://datadryad.org/>)

A curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. It has strong ties to the biosciences but is open to all fields.

### Field-Specific Databases

These are highly curated databases focused on a single area of research.

#### Genomics and Molecular Biology:

- NCBI (National Center for Biotechnology Information): (<https://www.ncbi.nlm.nih.gov/>) A suite of databases, including:

    Gene Expression Omnibus (GEO): For gene expression data from microarrays and sequencing.

    Sequence Read Archive (SRA): For raw sequencing data from next-generation sequencers.

- Protein Data Bank (PDB): (<https://www.rcsb.org/>) A database of 3D structural data for large biological molecules like proteins and nucleic acids. This is essential for the kind of molecular dynamics work mentioned in the Sfriso & Crave paper.

#### Environmental and Earth Sciences:

- USGS (U.S. Geological Survey) Data Releases: (<https://www.usgs.gov/data>) Provides public access to water data (matching the Kaur & Godara study), geological data (relevant to the Kahangwa study), and more.

- NOAA (National Oceanic and Atmospheric Administration): (<https://www.noaa.gov/data>) The primary source for climate, weather, and oceanographic data.

- Copernicus: (<https://www.copernicus.eu/en>) The European Union's Earth observation programme, providing vast amounts of satellite imagery and environmental data.

#### Astronomy and Physics:

Sloan Digital Sky Survey (SDSS): (<https://www.sdss.org/>) A massive survey that has created the most detailed three-dimensional maps of the Universe ever made. It provides images and spectra for millions of celestial objects. (We will use this for our example).

MAST (Mikulski Archive for Space Telescopes): (<https://mast.stsci.edu/>) The official data archive for NASA's Hubble, James Webb, and other space telescopes.

ESO Science Archive Facility: (<http://archive.eso.org/>) The data archive for the European Southern Observatory's telescopes, including the Very Large Telescope (VLT).

#### Chemistry:

- PubChem: (<https://pubchem.ncbi.nlm.nih.gov/>) A massive database of chemical molecules and their activities against biological assays.

- Spectral Database for Organic Compounds (SDBS): (<https://sdbs.db.aist.go.jp/>) A free database containing various types of spectra (MS, NMR, IR, Raman) for thousands of organic compounds.

- Matminer: <https://hackingmaterials.lbl.gov/matminer/>, <https://github.com/hackingmaterials/matminer>, <https://www.sciencedirect.com/science/article/abs/pii/S0927025618303252>


### Some problems
Challenges and Best Practices
- Validation: ML models must be validated externally to ensure findings are not merely fitting noise.
- Reproducibility: Ensuring findings can be replicated is a critical challenge, with frameworks like the REFORMS checklist (32 items) being developed to ensure high standards.
- Limitations: While powerful, ML does not replace researchers and should be used to complement, not solely define, scientific discovery.
- Blackbox, again
- Publications: <https://arstechnica.com/science/2025/12/llms-impact-on-science-booming-publications-stagnating-quality/> : LLMs are profoundly transforming scientific publishing by driving a 23.7%–89.3% increase in manuscript production, particularly benefiting non-native English speakers. While boosting output and aiding in drafting, literature review, and editing, their use is linked to a decline in substantive quality and a rise in linguistically complex but shallow, "soulless" papers. Furthermore, paper evaluation os also heavily using LLM, so, besides input poissoinig, we are towards an ai based system where the human is just the information transmiter. 


## Quantum ML
```{figure} figs/quantumML-01.png
---
width: 90%
align: center
name: 
---
Source: <https://youtu.be/EKOU3JWDNLI?si=o9YhT0GJ3dNg16K_&t=423>
```

> Quantum machine learning (QML) is the study of quantum algorithms for machine learning.It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning.

- <https://en.wikipedia.org/wiki/Quantum_machine_learning>
- <https://arxiv.org/abs/2310.03011>
- <https://quantum.cloud.ibm.com/learning/en/courses/quantum-machine-learning/introduction>
- <https://pennylane.ai/qml/whatisqml>

Quantum ML is in a very early stage, although google has shown the first sucessfull application:
- <https://blog.google/intl/es-es/noticias-compania/nuestro-algoritmo-quantum-echoes-es-un-gran-paso-hacia-las-aplicaciones-reales-de-la-computacion-cuantica/>
- <https://www.youtube.com/watch?v=mEBCQidaNTQ>

```{figure} figs/googleqml.jpg
---
width: 80 %
align: center
name: 
---

```


## Some tools for research
- notebooklm: <https://notebooklm.google/>
- Google collab: <https://colab.research.google.com/>
- Awesome ai for science: <https://github.com/ai-boost/awesome-ai-for-science>
- LLM peer review: <https://github.com/VijayGKR/LLM-Peer-Review>
- AgentReview: <https://agentreview.github.io/>
- AIScientist (<https://arxiv.org/abs/2504.08066>), Google co-scientist (<https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/>)
- Review setup: <https://storm.genie.stanford.edu/>
- AI Assistant to Automate Everyday Research Tasks <https://scispace.com/>, also check <https://mystylus.ai/>
- <https://www.connectedpapers.com/>, <https://consensus.app/>
- <https://www.sapiosciences.com/blog/10-scientific-ai-tools-every-scientist-should-know-in-2025-26/>
- <https://scite.ai/>, <https://www.litmaps.com/>
- <https://www.semanticscholar.org/>
- ...

# Practical and short example (also the blackbox)

```{figure} ./figs/AI-ML.png
---
width: 80 %
align: center
name: 
---

```


<!-- <img src="./figs/AI-ML.png" alt="ML trends" width=40% align="center" > -->

In [None]:
# Data creation
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples=100, centers=2, random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr')
plt.title("Input data")
plt.show()

### Supervised
- Learn from labeled examples
- Task: Prediction (classification or regression)

In [None]:
# Classification
from sklearn.linear_model import LogisticRegression
import numpy as np

# Train a classifier
model = LogisticRegression()
model.fit(X, y)

# Plot decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200), np.linspace(y_min, y_max, 200))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, cmap='bwr')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr')
plt.title("Supervised Learning: Logistic Regression")
plt.show()

### Unsupervised 
- No labels, find structure in data
- Task: Clustering or dimensionality reduction

In [None]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
preds = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=preds, cmap='cool')
plt.title("Unsupervised: K-means Clustering")
plt.show()

### Reinforcement learning
- Learn by trial and error
- Agent interacts with environment


In [None]:
%%html

<iframe width="560" height="315" src="https://www.youtube.com/embed/L_4BPjLBF4E?si=QAcI7mfzcNeu1hn6" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

## Typical workflow

```{figure} ./figs/ML-workflow.png
---
width: 90 %
align: center
name: 
---

```


<!-- <img src="./figs/ML-workflow.png" alt="ML workflow" width="50%" align="center"> -->


1. **Dataset Collection**: Depends on the experiment or goals. What kind of data ? (categorical numerical) How much data? Units? reference data? data base? Data storage/access? 
2. **Dataset preprocessing**: Cleaning data. Missing data. Noise. Outliers. Normalization. Training and test sets. Or Train, validation (for hyper parameters), and test set. 
3. **Model training**: Depends on the actual approach. For supervised learning we need both input and output values. For unsupervised only input. No underfitting or overfitting. 
4. **Model evaluation**: Testing the training success, with some defined metrics. Maybe needs to redo some previous steps.

### Core concepts
- **Data**: examples used for learning
- **Features**: inputs (e.g., age, temperature, pixels)
- **Model**: function that maps input to output
- **Training**: adjusting model to reduce error
- **Testing**: evaluate model on new data
- **Meta-Parameters**: Parameters controlling the model

Beware of under/over fitting : See also last part of <https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

x = np.linspace(0, 6, 30)
y = np.sin(x) + 0.3 * np.random.randn(30)
X = x[:, np.newaxis]

# True function
x_plot = np.linspace(0, 6, 100).reshape(-1, 1)

plt.figure(figsize=(15, 4))

# Underfitting (degree=1)
plt.subplot(1, 3, 1)
model_under = make_pipeline(PolynomialFeatures(1), LinearRegression())
model_under.fit(X, y)
plt.scatter(x, y, label='data')
plt.plot(x_plot, np.sin(x_plot), label='true function')
plt.plot(x_plot, model_under.predict(x_plot), label='underfit model')
plt.title("Underfitting")
plt.legend()

# Good fit (degree=3)
plt.subplot(1, 3, 2)
model_good = make_pipeline(PolynomialFeatures(3), LinearRegression())
model_good.fit(X, y)
plt.scatter(x, y, label='data')
plt.plot(x_plot, np.sin(x_plot), label='true function')
plt.plot(x_plot, model_good.predict(x_plot), label='good fit')
plt.title("Good Fit")
plt.legend()

# Overfitting (degree=15)
plt.subplot(1, 3, 3)
model_over = make_pipeline(PolynomialFeatures(15), LinearRegression())
model_over.fit(X, y)
plt.scatter(x, y, label='data')
plt.plot(x_plot, np.sin(x_plot), label='true function')
plt.plot(x_plot, model_over.predict(x_plot), label='overfit model')
plt.title("Overfitting, poor generalization")
plt.legend()

plt.tight_layout()
plt.show()

## ML Algorithms
- https://www.datacamp.com/cheat-sheet/machine-learning-cheat-sheet
- https://sites.google.com/view/datascience-cheat-sheets
- https://github.com/SamBelkacem/AI-ML-cheatsheets
- https://www.naftaliharris.com/blog/visualizing-k-means-clustering/


```{figure} ./figs/ML-CheatSheet-01.webp
---
width: 80 %
align: center
name: 
---

```
<!-- <img src="./figs/ML-CheatSheet-01.webp" alt="Some algs ML" width="50%" align="center"> -->

```{figure} ./figs/ML-Cheat-Sheet_2.png
---
width: 90 %
align: center
name: 
---

```
<!-- <img src="./figs/ML-Cheat-Sheet_2.png" alt="ML cheatsheet" width="50%" align="center"> -->

### Classifier comparison:
<https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html>

<img src="https://scikit-learn.org/stable/_images/sphx_glr_plot_classifier_comparison_001.png" alt="Classifiers" width="90%" align="center">

### Tensor flow playground
Try: <https://playground.tensorflow.org>

# ML Ethics, risks and future

## Bias
<img src="https://www.approximatelycorrect.com/wp-content/uploads/2016/11/futurama-judge.png" width=50%>

- <http://approximatelycorrect.com/2016/11/07/the-foundations-of-algorithmic-bias/>, <https://www.theverge.com/2024/2/21/24079371/google-ai-gemini-generative-inaccurate-historical>, <https://www.economist.com/united-states/2024/02/28/is-googles-gemini-chatbot-woke-by-accident-or-design>  
- https://www.edx.org/course/data-science-ethics-michiganx-ds101x-1 
- https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction?useskin=vector
- Bias in algorithms (data reflects societal bias).

## "AGI"
- <https://en.wikipedia.org/wiki/Artificial_general_intelligence?useskin=vector>
|<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Performance_on_benchmarks_compared_to_humans_-_2024_AI_index.jpg/1920px-Performance_on_benchmarks_compared_to_humans_-_2024_AI_index.jpg" width=80%>|
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Estimations_of_Human_Brain_Emulation_Required_Performance.svg/1920px-Estimations_of_Human_Brain_Emulation_Required_Performance.svg.png" width=80%>|

- Can we really simulate the brain? Neural scaling laws:

```{figure} figs/ai-scaling.png
---
width: 80 %
align: center
name: 
---

```

  - CHECK: <https://www.youtube.com/watch?v=5eqRuVp65eY>
  - <https://arxiv.org/abs/2001.08361>

### The illusion of thinking
- <https://machinelearning.apple.com/research/illusion-of-thinking>
- [Seven replies to the viral Apple reasoning paper – and why they fall short](https://garymarcus.substack.com/p/seven-replies-to-the-viral-apple)
- Apples paper limitation: <https://www.youtube.com/watch?v=9wZhFPGewz0>

```{figure} https://mlr.cdn-apple.com/media/main_figure_f794f49488.png
---
width: 80 %
align: center
name: 
---

```


## Hardware prices
- https://lifehacker.com/tech/ram-prices-going-up
- https://pcpartpicker.com/trends/price/memory/
- Nvidia stock: https://finance.yahoo.com/quote/NVDA/
- https://investor.nvidia.com/stock-info/stock-quote-and-chart/default.aspx
- https://www.laptopoutlet.co.uk/blog/gpu-prices-2020-to-2025-analysis.html

```{figure} https://cdna.pcpartpicker.com/static/forever/images/trends/2026.01.30.usd.ram.ddr4.3000.2x8192.2b5e5afee034453ed362d07a8d2f9131.png
---
width: 90 %
align: center
name: 
---
```


## Interpretability and Understanding
- Transparency (black-box models).
- Vibe coding full of security holes/lack of understanding
- Seniors are the new juniors, juniors are the new technicians

```{figure} https://media.licdn.com/dms/image/v2/D4D12AQEqy_ok1gMz5Q/article-cover_image-shrink_720_1280/B4DZZ3zDFdGsAM-/0/1745766597968?e=2147483647&v=beta&t=__9JD2teK0NaV-9GnDAsi7Zk-eduHv9pmW5N15wWvjk
---
width: 80 %
align: center
name: 
---

```

## Jobs
- Automation and the future of work.
- Automatic CV acceptance/rejection
- Automatic progress assestment

```{figure} figs/softwarejobs.png
---
width: 90%
align: center
name: 
---
Source: <https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE#>

```

## War
- Autonomous bombing/attack
- Autonomous target identification / lack of context
- Lack of "human in the loop"
- <https://www.war.gov/News/Releases/Release/Article/4376420/war-department-launches-ai-acceleration-strategy-to-secure-american-military-ai/>
- https://gjia.georgetown.edu/2024/07/12/war-artificial-intelligence-and-the-future-of-conflict/

```{figure} figs/aiwar.png
---
width: 40%
align: center
name: 
---

```

Check: Cities covered by fiber optics <https://youtube.com/shorts/pfG4ZYxj67w?si=sM9iP6oMv3QU3E2G>


## Censorship
- AI alignment and safety : <https://github.com/asgeirtj/system_prompts_leaks>
- <https://www.abc.net.au/news/2025-06-04/beijing-ai-and-censors-erase-tiananmen-square-massacre/105370772>
- Using chatbots as search agents
- <https://time.com/6835213/the-future-of-censorship-is-ai-generated/>
- [Grok suspended on twitter](https://www.vice.com/en/article/elon-musks-grok-got-suspended-on-twitter-x/), then returned 'healed' and now [checks for elon musk views](https://www.cnbc.com/2025/07/11/grok-4-appears-to-reference-musks-views-when-answering-questions-.html) before answering
- https://news.un.org/en/story/2025/05/1162856

```{figure} https://gradientflow.com/wp-content/uploads/2024/07/newsletter106-alignment-siloes.png
---
width: 80 %
align: center
name: 
---

```


## Electric and hardware use
- https://spectrum.ieee.org/ai-energy-use
- Power usage (https://www.researchgate.net/figure/Reported-energy-consumption-of-training-different-LLM-models-with-respect-to-model_fig5_384115745):
  <img src="https://www.researchgate.net/profile/Yuzhuo-Li-2/publication/384115745/figure/fig5/AS:11431281278937909@1726775834031/Reported-energy-consumption-of-training-different-LLM-models-with-respect-to-model.png" alt="https://www.researchgate.net/publication/384115745_The_Unseen_AI_Disruptions_for_Power_Grids_LLM-Induced_Transients" width="80%" align="center"> 
  + <https://www.nature.com/articles/s41598-024-76682-6>
  + <https://birchtree.me/blog/another-study-on-llm-energy-use/>
  + <https://www.goldmansachs.com/insights/articles/how-ai-is-transforming-data-centers-and-ramping-up-power-demand>
  + <https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/>

  

<img src="https://images.prismic.io/bethtechnology/ZnfatZbWFboweyK6_2.jpg?auto=compress,format&q=90&max-w=1000?w=1080" alt="inference energy" width="80%" align="center">

Source : <https://io-fund.com/artificial-intelligence/ai-platforms/ai-power-consumption-becoming-mission-critical>

<img src="https://cdn.statcdn.com/Infographic/images/normal/33730.jpeg" width=80%>

Source: 
- https://www.statista.com/chart/33730/projected-and-currently-operating-nuclear-capacity/
- https://www.statista.com/statistics/513671/number-of-under-construction-nuclear-reactors-worldwide/


<img src="https://images.prismic.io/bethtechnology/aF3wNXfc4bHWixRg_Comparisonofnuclearvs.fossilfuelenergyoutputandemissionsefficiency.png?auto=compress,format&q=90&max-w=1000?w=1920" alt="energy needs" width="80%" align="center">

Source: <https://io-fund.com/artificial-intelligence/nuclear-energy-ai-data-centers>


## Fake videos/news

![ai fake video](figs/ai-fakevideo.png "ai news")

- [Ai news Ex1](https://drive.google.com/open?id=1pLI2G7_tbFzZx94GQzuWkT2_-Q98s15J&usp=drive_fs)
- [AI news Ex2](https://drive.google.com/open?id=1GBf6OEMKJdkdy7vIC4pZYk_QatWo68Yh&usp=drive_fs)
- Open AI Sora: <https://openai.com/index/sora/>
- Google Ve3 with sound:  <https://gemini.google/overview/video-generation/>
- <https://www.reddit.com/r/aivideos/>
- Clonning any voice: <https://huggingface.co/spaces/Qwen/Qwen3-TTS>, <https://simonwillison.net/2026/Jan/22/qwen3-tts/>

In [None]:
from IPython.display import IFrame
file_url = "https://drive.google.com/file/d/1pLI2G7_tbFzZx94GQzuWkT2_-Q98s15J/preview"
IFrame(file_url, width=640, height=480)

In [None]:
from IPython.display import IFrame
file_url = "https://drive.google.com/file/d/1GBf6OEMKJdkdy7vIC4pZYk_QatWo68Yh/preview"
IFrame(file_url, width=640, height=480)

### AI Therapy / sycophancy problem / relationships


```{figure} https://cdn.arstechnica.net/wp-content/uploads/2025/07/robot_therapy_1.jpg
---
width: 50 %
align: center
name: 
---

```


- <https://www.theguardian.com/technology/2025/may/07/experts-warn-therapy-ai-chatbots-are-not-safe-to-use>
- <https://arstechnica.com/ai/2025/07/ai-therapy-bots-fuel-delusions-and-give-dangerous-advice-stanford-study-finds/>
- Openai GPT5 destroyed "AI" boyfriends/girlfriends
- Robots assistants: <https://www.kscale.dev/>, <https://www.1x.tech/neo>, <https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/>
- https://techcrunch.com/2025/10/27/openai-says-over-a-million-people-talk-to-chatgpt-about-suicide-weekly/
- https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html

```{figure} figs/robot-love.webp
---
width: 40%
align: center
name: 
---

```




# Some "modern" examples and tools, with some demos
- https://karpathy.bearblog.dev/year-in-review-2025/
- https://www.youtube.com/watch?v=EKOU3JWDNLI
- https://raine.dev/blog/my-tmux-setup/ : Running agents in the console
- OpenClowd (formerly MoltBot) (formerly Clowdbot) (hype?):
  + <https://openclaw.ai/>
  + <https://www.youtube.com/watch?v=MUDvwqJWWIw>
  + <https://www.youtube.com/watch?v=esXXuejofgk>
  + https://arstechnica.com/information-technology/2026/01/ai-agents-now-have-their-own-reddit-style-social-network-and-its-getting-weird-fast/
  + https://www.moltbook.com/ (and the new molt religion)
- Local models: <https://ollama.com/> , <https://lmstudio.ai/>
