# Ver repositorios de código en Github:  

- TFM Implementacion_de_Mobile_ViT: https://github.com/eleanarey/TFM-MUII-UOC/blob/main/Implementacion_de_Mobile_ViT.ipynb

- Qualcomm Mobile ViT: https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/mobile_vit

- Apple Mobile ViT: https://github.com/apple/ml-cvnets/tree/main

- Vision Transformer: https://github.com/google-research/vision_transformer

- CLIP: https://github.com/openai/CLIP


---


# Referencias:

- Mehta, S., Rastegari, M., Caspi, A., & Hajishirzi, H. (2022). **MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer**. *arXiv preprint arXiv:2110.02178*. Recuperado de [https://arxiv.org/abs/2110.02178](https://arxiv.org/abs/2110.02178)

- Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). **MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications**. *arXiv preprint arXiv:1704.04861*. Recuperado de [https://arxiv.org/abs/1704.04861](https://arxiv.org/abs/1704.04861)

- Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). **MobileNetV2: Inverted Residuals and Linear Bottlenecks**. *arXiv preprint arXiv:1801.04381*. Recuperado de [https://arxiv.org/abs/1801.04381](https://arxiv.org/abs/1801.04381)

- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). **An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)**. *arXiv preprint arXiv:2010.11929*. Recuperado de [https://arxiv.org/abs/2010.11929](https://arxiv.org/abs/2010.11929)

- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). **Learning Transferable Visual Models From Natural Language Supervision (CLIP)**. *arXiv preprint arXiv:2103.00020*. Recuperado de [https://arxiv.org/pdf/2103.00020](https://arxiv.org/pdf/2103.00020)

- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). **Attention Is All You Need (Transformers)**. *arXiv preprint arXiv:1706.03762*. Recuperado de [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)

---

# Ver proyectos en Hugging Face:

- Apple Mobile ViT Small: https://huggingface.co/apple/mobilevit-small

- Mobile ViT qualcomm: https://huggingface.co/qualcomm/Mobile_Vit

- MobileNets: https://huggingface.co/google/mobilenet_v2_1.0_224

- Vision Transformer (base-sized model): https://huggingface.co/google/vit-base-patch16-224-in21k

- CLIP: https://huggingface.co/openai/clip-vit-base-patch32

---

# Ver principales Datasets publicos referenciados y utilizados en los modelos:

- Imagenet: https://www.image-net.org/

- Flickr8k: https://huggingface.co/datasets/Naveengo/flickr8k

- MSRA-TD500: https://huggingface.co/datasets/yunusserhat/MSRA-TD500-Dataset

---

Código disponible en Colab:

<a href="https://colab.research.google.com/drive/1Cj246i8QJ8z3Z71Bbfhez4HU1LHlqzZG"
target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Licensed under the BSD 3-Clause License
# The license for the original implementation of Mobile_Vit
#Copyright (c) Soumith Chintala 2016,
#All rights reserved.

# https://github.com/pytorch/vision/blob/main/LICENSE

#Redistribution and use in source and binary forms, with or without
#modification, are permitted provided that the following conditions are met:
#
#* Redistributions of source code must retain the above copyright notice, this
#  list of conditions and the following disclaimer.
#
#* Redistributions in binary form must reproduce the above copyright notice,
#  this list of conditions and the following disclaimer in the documentation
#  and/or other materials provided with the distribution.
#
#* Neither the name of the copyright holder nor the names of its
#  contributors may be used to endorse or promote products derived from
#  this software without specific prior written permission.
#
#THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
#AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
#IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
#DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
#FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
#DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
#SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
#CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
#OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In [14]:
#@markdown Select whether you would like to store data in your personal drive.
#@markdown
#@markdown If you select **yes**, you will need to authorize Colab to access
#@markdown your personal drive
#@markdown
#@markdown If you select **no**, then any changes you make will diappear when
#@markdown this Colab's VM restarts after some time of inactivity...
use_gdrive = 'yes'  #@param ["yes", "no"]


if use_gdrive == 'yes':
  from google.colab import drive
  # Montar Google Drive
  # "comentamos esta celda y añadimos la siguiente" drive.mount('/gdrive')
  drive.mount('/content/drive', force_remount=True)
  # Ruta al archivo ZIP en Google Drive
  # "comentamos esta celda y añadimos la siguiente" root = '/gdrive/My Drive/vision_transformer_colab'
  root = "/content/drive/MyDrive/Colab_Notebooks"
  import os
  if not os.path.isdir(root):
    os.mkdir(root)
  os.chdir(root)
  print(f'\nChanged CWD to "{root}"')
else:
  from IPython import display
  display.display(display.HTML(
      '<h1 style="color:red">CHANGES NOT PERSISTED</h1>'))

Mounted at /content/drive

Changed CWD to "/content/drive/MyDrive/Colab_Notebooks"


In [3]:
# Clonar el repositorio
#!git clone https://github.com/quic/ai-hub-models.git
%cd ai-hub-models/qai_hub_models/models/mobile_vit
!pip install -r requirements.txt
!pip install "qai_hub_models[mobile-vit]"
!pip install transformers torch torchvision


/content/drive/MyDrive/Colab_Notebooks/ai-hub-models/qai_hub_models/models/mobile_vit
Collecting transformers==4.41.1 (from -r requirements.txt (line 1))
  Downloading transformers-4.41.1-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting timm==1.0.3 (from -r requirements.txt (line 2))
  Downloading timm-1.0.3-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.20,>=0.19 (from transformers==4.41.1->-r requirements.txt (line 1))
  Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.41.1-py3-none-any.whl (9.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m64.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading timm-1.0.3-py3-none-



In [9]:
import sys
sys.path.append('/content/drive/MyDrive/Colab_Notebooks/ai-hub-models')
!ls

drive  sample_data


# Implementación del modelo compilado en Android

Los modelos se pueden implementar utilizando múltiples tiempos de ejecución, al exportarlos con TensorFlow Lite (.tflite export)

Este tutorial proporciona una guía para implementar el modelo tflite en una aplicación de Android:

https://ai.google.dev/edge/litert/android?hl=es-419

In [1]:
%cd /content/drive/MyDrive/Colab_Notebooks/build/mobile_vit
!pip install tensorflow==2.13.0 tensorflow-addons==0.20.0 onnx-tf
# pip install keras==2.13.1

import torch
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

# Cargar el modelo PyTorch
from qai_hub_models.models.mobile_vit.model import MobileVIT

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MobileVIT.from_pretrained()
model.net.classifier = torch.nn.Linear(model.net.classifier.in_features, 2)  # Configurar la capa para 2 clases
model.load_state_dict(torch.load("mobilevit_text_classification_v1.pth"))
model.to(device)
model.eval()
print("Modelo PyTorch cargado correctamente.")

# Exportar a ONNX
onnx_model_path = "mobilevit_text_classification_v1.onnx"
dummy_input = torch.randn(1, 3, 224, 224).to(device)  # Entrada ficticia
torch.onnx.export(
    model,
    dummy_input,
    onnx_model_path,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
    opset_version=11
)
print(f"Modelo exportado a ONNX: {onnx_model_path}")

# Convertir ONNX a TensorFlow
onnx_model = onnx.load(onnx_model_path)
tf_model_path = "mobilevit_text_classification_v1_saved_model"
tf_rep = prepare(onnx_model)
tf_rep.export_graph(tf_model_path)
print(f"Modelo convertido a TensorFlow SavedModel: {tf_model_path}")

# Convertir TensorFlow a TFLite
converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
tflite_model = converter.convert()

# Guardar el modelo TFLite
tflite_model_path = "mobilevit_text_classification_v1.tflite"
with open(tflite_model_path, "wb") as f:
    f.write(tflite_model)
print(f"Modelo convertido a TFLite: {tflite_model_path}")



/content/drive/MyDrive/Colab_Notebooks/build/mobile_vit


ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.16; Detected an installation of version 2.13.0. Please upgrade TensorFlow to proceed.

# Para implementar un modelo `.tflite` en una aplicación Android, se debe usar **TensorFlow Lite Interpreter**.

Fuente: https://developer.android.com/ai/custom

Aquí está una guía paso a paso:

---

### **1. Configurar el Proyecto Android**

#### **1.1. Agregar Dependencias**
Agrega las dependencias necesarias para TensorFlow Lite en tu archivo `build.gradle` (nivel de módulo `app`):

```gradle
implementation 'org.tensorflow:tensorflow-lite:2.10.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.4.3' // Para utilidades como preprocesamiento
```

---

### **2. Copiar el Modelo al Proyecto**
1. Coloca el archivo `.tflite` en el directorio `assets` de tu proyecto Android:
   - Ruta: `app/src/main/assets/MobileViT.tflite`.
2. Crea la carpeta `assets` si no existe.

---

### **3. Implementar Inferencia con TensorFlow Lite**

#### **3.1. Código para Realizar Inferencia**
El siguiente código muestra cómo cargar el modelo y realizar inferencias:

```java
import android.content.res.AssetFileDescriptor;
import android.graphics.Bitmap;
import android.graphics.Color;
import android.util.Log;

import org.tensorflow.lite.Interpreter;
import org.tensorflow.lite.support.tensorbuffer.TensorBuffer;
import org.tensorflow.lite.support.image.TensorImage;

import java.io.FileInputStream;
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class TFLiteInference {

    private Interpreter tflite;
    private static final int IMAGE_SIZE = 224; // Tamaño de entrada del modelo (224x224)

    public TFLiteInference(String modelPath) {
        // Cargar el modelo
        try {
            tflite = new Interpreter(loadModelFile(modelPath));
        } catch (IOException e) {
            Log.e("TFLite", "Error al cargar el modelo", e);
        }
    }

    private MappedByteBuffer loadModelFile(String modelPath) throws IOException {
        AssetFileDescriptor fileDescriptor = MainActivity.getContext().getAssets().openFd(modelPath);
        FileInputStream inputStream = new FileInputStream(fileDescriptor.getFileDescriptor());
        FileChannel fileChannel = inputStream.getChannel();
        long startOffset = fileDescriptor.getStartOffset();
        long declaredLength = fileDescriptor.getDeclaredLength();
        return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
    }

    public String predict(Bitmap bitmap, String[] classes) {
        if (tflite == null) {
            return "Modelo no cargado.";
        }

        // Preprocesar la imagen
        Bitmap resizedBitmap = Bitmap.createScaledBitmap(bitmap, IMAGE_SIZE, IMAGE_SIZE, true);
        TensorImage tensorImage = TensorImage.fromBitmap(resizedBitmap);

        // Crear buffer para la salida
        TensorBuffer outputBuffer = TensorBuffer.createFixedSize(new int[]{1, classes.length}, org.tensorflow.lite.DataType.FLOAT32);

        // Ejecutar inferencia
        tflite.run(tensorImage.getBuffer(), outputBuffer.getBuffer().rewind());

        // Obtener la clase con mayor puntuación
        float[] scores = outputBuffer.getFloatArray();
        int maxIndex = argMax(scores);
        return classes[maxIndex];
    }

    private int argMax(float[] scores) {
        int maxIndex = 0;
        for (int i = 1; i < scores.length; i++) {
            if (scores[i] > scores[maxIndex]) {
                maxIndex = i;
            }
        }
        return maxIndex;
    }
}
```

---

### **4. Integrar con la Actividad Principal**
Aquí tienes un ejemplo de cómo usar la clase anterior para cargar el modelo y realizar inferencias:

```java
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Bundle;
import android.util.Log;

import androidx.appcompat.app.AppCompatActivity;

import java.io.IOException;
import java.io.InputStream;

public class MainActivity extends AppCompatActivity {

    private static final String MODEL_PATH = "MobileViT.tflite";
    private static final String[] IMAGENET_CLASSES = {
            "tench", "goldfish", "great white shark", "tiger shark", "hammerhead", /* Agrega todas las clases */
    };

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        // Crear instancia de TFLiteInference
        TFLiteInference inference = new TFLiteInference(MODEL_PATH);

        // Cargar una imagen de los assets para prueba
        Bitmap bitmap = loadImageFromAssets("example_image.jpg");

        // Realizar predicción
        if (bitmap != null) {
            String prediction = inference.predict(bitmap, IMAGENET_CLASSES);
            Log.d("TFLite", "Predicción: " + prediction);
        }
    }

    private Bitmap loadImageFromAssets(String fileName) {
        try (InputStream inputStream = getAssets().open(fileName)) {
            return BitmapFactory.decodeStream(inputStream);
        } catch (IOException e) {
            Log.e("TFLite", "Error al cargar la imagen: ", e);
            return null;
        }
    }
}
```

---

### **5. Preprocesamiento**
El modelo MobileViT requiere imágenes de tamaño `224x224` con normalización (`mean=0.5, std=0.5`). Esto se realiza en la función `TensorImage.fromBitmap`.

---

### **6. Prueba**
1. Copia el archivo `.tflite` y una imagen de prueba (`example_image.jpg`) en la carpeta `assets`.
2. Ejecuta la aplicación en un dispositivo Android.
3. Verifica las predicciones en el `Logcat` o en un componente de la interfaz de usuario (como un `TextView`).

---

### **Conclusión**
Usar un modelo `.tflite` en Android es eficiente y directo gracias al soporte nativo de TensorFlow Lite.