<a href="https://colab.research.google.com/github/galenzo17/AI-personal-test/blob/main/Viste.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Paso 1: Instalación de paquetes necesarios
print("Instalando paquetes necesarios...")

try:
    import torch
    import torchvision
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    from PIL import Image
    print("Paquetes ya instalados.")
except ImportError as e:
    print(f"Error al importar paquetes: {e}")
    print("Instalando paquetes faltantes...")
    !pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
    !pip install opencv-python
    import torch
    import torchvision
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    from PIL import Image
    print("Paquetes instalados correctamente.")


Instalando paquetes necesarios...
Paquetes ya instalados.


In [2]:
# Paso 2: Importación de bibliotecas y configuración del dispositivo
print("Importando bibliotecas y configurando el dispositivo...")

import torch
import torchvision
from torchvision import transforms
import cv2
import numpy as np
import time
import matplotlib.pyplot as plt
from PIL import Image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Usando el dispositivo: {device}")


Importando bibliotecas y configurando el dispositivo...
Usando el dispositivo: cuda


In [None]:
# Paso 3: Carga de modelos pre-entrenados
print("Cargando modelos pre-entrenados...")

# Modelo de detección de personas (Faster R-CNN pre-entrenado en COCO)
person_detection_model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights='DEFAULT')
person_detection_model.to(device)
person_detection_model.eval()

# Modelo de reconocimiento de actividades (ResNet3D pre-entrenado en Kinetics-400)
activity_recognition_model = torchvision.models.video.r3d_18(weights='KINETICS400_V1')
activity_recognition_model.to(device)
activity_recognition_model.eval()

print("Modelos cargados correctamente.")


Cargando modelos pre-entrenados...


Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:01<00:00, 121MB/s]
Downloading: "https://download.pytorch.org/models/r3d_18-b3b3357e.pth" to /root/.cache/torch/hub/checkpoints/r3d_18-b3b3357e.pth
100%|██████████| 127M/127M [00:00<00:00, 138MB/s]


Modelos cargados correctamente.


In [3]:
# Paso 4: Definición de transformaciones y funciones auxiliares
print("Definiendo transformaciones y funciones auxiliares...")

# Transformación para el modelo de detección
detection_transform = transforms.Compose([
    transforms.ToTensor()
])

# Transformación para el modelo de actividad
activity_transform = transforms.Compose([
    transforms.Resize((112, 112)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.43216, 0.394666, 0.37645],
        std=[0.22803, 0.22145, 0.216989]),
])

# Lista de etiquetas de clases de Kinetics-400
kinetics_class_names = [
    "abseiling",
    "air drumming",
    "answering questions",
    "applauding",
    "applying cream",
    "archery",
    "arm wrestling",
    "arranging flowers",
    "assembling computer",
    "attending conference",
    "backflip",
    "baking cookies",
    "balloon blowing",
    "bandaging",
    "barbequing",
    "bartending",
    "beatboxing",
    "bee keeping",
    "belly dancing",
    "bench pressing",
    "bending back",
    "bending metal",
    "biking through snow",
    "blasting sand",
    "blowing glass",
    "blowing leaves",
    "blowing nose",
    "blowing out candles",
    "bobsledding",
    "bookbinding",
    "bouncing on trampoline",
    "bowling",
    "braiding hair",
    "breading or breadcrumbing",
    "breakdancing",
    "brush painting",
    "brushing hair",
    "brushing teeth",
    "building cabinet",
    "building shed",
    "bungee jumping",
    "burping",
    "busking",
    "canoeing or kayaking",
    "capoeira",
    "carrying baby",
    "cartwheeling",
    "carving pumpkin",
    "catching fish",
    "catching or throwing baseball",
    "catching or throwing frisbee",
    "catching or throwing softball",
    "celebrating",
    "changing oil",
    "changing wheel",
    "checking tires",
    "cheerleading",
    "chopping wood",
    "clapping",
    "clay pottery making",
    "clean and jerk",
    "cleaning floor",
    "cleaning gutters",
    "cleaning pool",
    "cleaning shoes",
    "cleaning toilet",
    "cleaning windows",
    "climbing a rope",
    "climbing ladder",
    "climbing tree",
    "contact juggling",
    "cooking chicken",
    "cooking egg",
    "cooking on campfire",
    "cooking sausages",
    "counting money",
    "country line dancing",
    "cracking neck",
    "crawling baby",
    "crossing river",
    "crying",
    "cumbia",
    "curling hair",
    "cutting nails",
    "cutting pineapple",
    "cutting watermelon",
    "dancing ballet",
    "dancing charleston",
    "dancing gangnam style",
    "dancing macarena",
    "deadlifting",
    "decorating the christmas tree",
    "digging",
    "dining",
    "disc golfing",
    "diving cliff",
    "dodgeball",
    "doing aerobics",
    "doing laundry",
    "doing nails",
    "drawing",
    "dribbling basketball",
    "drinking",
    "drinking beer",
    "drinking shots",
    "driving car",
    "driving tractor",
    "drop kicking",
    "drumming fingers",
    "dunking basketball",
    "dying hair",
    "eating burger",
    "eating cake",
    "eating carrots",
    "eating chips",
    "eating doughnuts",
    "eating hotdog",
    "eating ice cream",
    "eating spaghetti",
    "eating watermelon",
    "egg hunting",
    "exercising arm",
    "exercising with an exercise ball",
    "extinguishing fire",
    "faceplanting",
    "feeding birds",
    "feeding fish",
    "feeding goats",
    "filling eyebrows",
    "finger snapping",
    "fixing hair",
    "flipping pancake",
    "flying kite",
    "folding clothes",
    "folding napkins",
    "folding paper",
    "front raises",
    "frying vegetables",
    "garbage collecting",
    "gargling",
    "getting a haircut",
    "getting a piercing",
    "getting a tattoo",
    "giving or receiving award",
    "golf chipping",
    "golf driving",
    "golf putting",
    "grinding meat",
    "grooming dog",
    "grooming horse",
    "gymnastics tumbling",
    "hammer throw",
    "headbanging",
    "headbutting",
    "high jump",
    "high kick",
    "historical reenactment",
    "hitting baseball",
    "hockey stop",
    "holding snake",
    "hopscotch",
    "hoverboarding",
    "hugging",
    "hula hooping",
    "hurdling",
    "hurling (sport)",
    "ice climbing",
    "ice fishing",
    "ice skating",
    "ironing",
    "javelin throw",
    "jetskiing",
    "jogging",
    "juggling balls",
    "juggling fire",
    "juggling soccer ball",
    "jumping into pool",
    "jumpstyle dancing",
    "kicking field goal",
    "kicking soccer ball",
    "kissing",
    "kitesurfing",
    "knitting",
    "krumping",
    "laughing",
    "laying bricks",
    "long jump",
    "lunge",
    "making a cake",
    "making a sandwich",
    "making bed",
    "making jewelry",
    "making pizza",
    "making snowman",
    "making sushi",
    "making tea",
    "marching",
    "massaging back",
    "massaging feet",
    "massaging legs",
    "massaging person's head",
    "milking cow",
    "mopping floor",
    "motorcycling",
    "moving furniture",
    "mowing lawn",
    "news anchoring",
    "opening bottle",
    "opening present",
    "paragliding",
    "parasailing",
    "parkour",
    "passing American football (in game)",
    "passing American football (not in game)",
    "passing soccer ball",
    "peeling apples",
    "peeling potatoes",
    "petting animal (not cat)",
    "petting cat",
    "picking fruit",
    "planting trees",
    "plastering",
    "playing accordion",
    "playing badminton",
    "playing bagpipes",
    "playing basketball",
    "playing bass guitar",
    "playing cards",
    "playing cello",
    "playing chess",
    "playing clarinet",
    "playing controller",
    "playing cricket",
    "playing cymbals",
    "playing didgeridoo",
    "playing drums",
    "playing flute",
    "playing guitar",
    "playing harmonica",
    "playing harp",
    "playing ice hockey",
    "playing keyboard",
    "playing kickball",
    "playing monopoly",
    "playing organ",
    "playing paintball",
    "playing piano",
    "playing poker",
    "playing recorder",
    "playing saxophone",
    "playing squash or racquetball",
    "playing tennis",
    "playing trombone",
    "playing trumpet",
    "playing ukulele",
    "playing violin",
    "playing volleyball",
    "playing xylophone",
    "pole vault",
    "presenting weather forecast",
    "pull ups",
    "pumping fist",
    "pumping gas",
    "pushing car",
    "pushing cart",
    "pushing wheelchair",
    "reading book",
    "reading newspaper",
    "recording music",
    "riding a bike",
    "riding camel",
    "riding elephant",
    "riding mechanical bull",
    "riding mountain bike",
    "riding mule",
    "riding or walking with horse",
    "riding scooter",
    "riding unicycle",
    "ripping paper",
    "robot dancing",
    "rock climbing",
    "rock scissors paper",
    "roller skating",
    "running on treadmill",
    "sailing",
    "salsa dancing",
    "sanding floor",
    "scrambling eggs",
    "scuba diving",
    "setting table",
    "shaking hands",
    "shaking head",
    "sharpening knives",
    "sharpening pencil",
    "shaving head",
    "shaving legs",
    "shearing sheep",
    "shining shoes",
    "shooting basketball",
    "shooting goal (soccer)",
    "shot put",
    "shoveling snow",
    "shredding paper",
    "shuffling cards",
    "side kick",
    "sign language interpreting",
    "singing",
    "situp",
    "skateboarding",
    "ski jumping",
    "skiing (not slalom or crosscountry)",
    "skiing crosscountry",
    "skiing slalom",
    "skipping rope",
    "skydiving",
    "slacklining",
    "slapping",
    "sled dog racing",
    "smoking",
    "smoking hookah",
    "snatch weight lifting",
    "sneezing",
    "sniffing",
    "snorkeling",
    "snowboarding",
    "snowkiting",
    "snowmobiling",
    "somersaulting",
    "spinning poi",
    "spray painting",
    "spraying",
    "springboard diving",
    "squat",
    "sticking tongue out",
    "stomping grapes",
    "stretching arm",
    "stretching leg",
    "strumming guitar",
    "surfing crowd",
    "surfing water",
    "sweeping floor",
    "swimming backstroke",
    "swimming breast stroke",
    "swimming butterfly stroke",
    "swing dancing",
    "swinging legs",
    "swinging on something",
    "sword fighting",
    "tai chi",
    "taking a shower",
    "tango dancing",
    "tap dancing",
    "tapping guitar",
    "tasting beer",
    "tasting food",
    "testifying",
    "texting",
    "throwing axe",
    "throwing ball",
    "throwing discus",
    "tickling",
    "tobogganing",
    "tossing coin",
    "tossing salad",
    "training dog",
    "trapezing",
    "trimming or shaving beard",
    "trimming trees",
    "triple jump",
    "twiddling fingers",
    "tying bow tie",
    "tying knot (not on a tie)",
    "tying tie",
    "unboxing",
    "unloading truck",
    "using computer",
    "using remote controller (not gaming)",
    "using segway",
    "vault",
    "waiting in line",
    "walking the dog",
    "washing dishes",
    "washing feet",
    "washing hair",
    "washing hands",
    "water skiing",
    "water sliding",
    "watering plants",
    "waxing back",
    "waxing chest",
    "waxing eyebrows",
    "waxing legs",
    "weaving basket",
    "welding",
    "whistling",
    "windsurfing",
    "wrapping present",
    "wrestling",
    "writing",
    "yawning",
    "yoga",
    "zumba"
]

print("Transformaciones y funciones definidas correctamente.")


Definiendo transformaciones y funciones auxiliares...
Transformaciones y funciones definidas correctamente.


In [6]:
# Paso 5: Subir un video al entorno de Colab
print("Por favor, selecciona un archivo de video para subir:")
from google.colab import files
uploaded = files.upload()

# Obtenemos el nombre del archivo
video_path = next(iter(uploaded.keys()))
print(f"Video subido: {video_path}")


Por favor, selecciona un archivo de video para subir:


Saving curso.webm to curso.webm
Video subido: curso.webm


In [9]:
# Paso 6: Procesamiento del video
print("Procesando el video...")

# Aseguramos que los modelos están cargados
if 'person_detection_model' not in globals():
    print("Cargando modelo de detección de personas...")
    person_detection_model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights='DEFAULT').to(device)
    person_detection_model.eval()

if 'activity_recognition_model' not in globals():
    print("Cargando modelo de reconocimiento de actividades...")
    activity_recognition_model = torchvision.models.video.r3d_18(weights='KINETICS400_V1').to(device)
    activity_recognition_model.eval()

# Inicializamos el lector de video
cap = cv2.VideoCapture(video_path)

if not cap.isOpened():
    print("Error al abrir el video.")
else:
    print("Video abierto correctamente.")

frame_rate = cap.get(cv2.CAP_PROP_FPS)
if frame_rate == 0:
    frame_rate = 25  # Valor predeterminado si no se puede obtener el frame rate
print(f"Frame rate del video: {frame_rate} FPS")

# Variables para almacenar datos
activity_durations = {}

frame_count = 0
sequence_length = 16  # Longitud de la secuencia para el modelo de actividad
current_sequence = []

try:
    while True:
        ret, frame = cap.read()
        if not ret:
            print("Fin del video.")
            break

        frame_count += 1
        print(f"Procesando frame {frame_count}")

        # Convertimos el frame a imagen PIL y aplicamos la transformación
        pil_image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        input_tensor = detection_transform(pil_image).to(device)

        # Detección de personas
        with torch.no_grad():
            detections = person_detection_model([input_tensor])[0]

        # Filtramos detecciones para mantener solo personas con alta confianza
        person_boxes = []
        for idx, label in enumerate(detections['labels']):
            if label == 1 and detections['scores'][idx] > 0.8:
                person_boxes.append(detections['boxes'][idx].cpu().numpy())

        # Si se detecta al menos una persona
        if person_boxes:
            # Añadimos el frame actual a la secuencia
            current_sequence.append(frame)
            if len(current_sequence) == sequence_length:
                # Tenemos una secuencia de frames suficiente para el modelo
                # Preprocesamos la secuencia
                clip = []
                for img in current_sequence:
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                    img = Image.fromarray(img)
                    img = activity_transform(img)
                    clip.append(img)

                # Convertimos la secuencia en tensor
                clip = torch.stack(clip).permute(1, 0, 2, 3).unsqueeze(0).to(device)

                # Reconocimiento de actividad
                with torch.no_grad():
                    outputs = activity_recognition_model(clip)
                    probabilities = torch.nn.functional.softmax(outputs, dim=1)
                    top5_prob, top5_catid = torch.topk(probabilities, 5)

                # Obtenemos la actividad con mayor probabilidad
                activity = kinetics_class_names[top5_catid[0][0]]
                print(f"Actividad detectada: {activity}")

                # Registramos la duración de la actividad
                activity_duration = (sequence_length / frame_rate)
                if activity in activity_durations:
                    activity_durations[activity] += activity_duration
                else:
                    activity_durations[activity] = activity_duration

                # Reiniciamos la secuencia actual
                current_sequence = []
        else:
            # Si no se detecta persona, reiniciamos la secuencia
            current_sequence = []

    # Verificamos si hay frames restantes en la secuencia al final del video
    if current_sequence:
        print("Procesando frames restantes en la secuencia final...")
        # Aquí podrías procesar los frames restantes si lo deseas

except Exception as e:
    print(f"Ocurrió un error durante el procesamiento: {e}")

finally:
    cap.release()
    print("Liberando recursos del video.")

print("Procesamiento completado.")


Procesando el video...
Cargando modelo de detección de personas...


Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:00<00:00, 174MB/s]


Cargando modelo de reconocimiento de actividades...


Downloading: "https://download.pytorch.org/models/r3d_18-b3b3357e.pth" to /root/.cache/torch/hub/checkpoints/r3d_18-b3b3357e.pth
100%|██████████| 127M/127M [00:00<00:00, 184MB/s]


Video abierto correctamente.
Frame rate del video: 1000.0 FPS
Procesando frame 1
Procesando frame 2
Procesando frame 3
Procesando frame 4
Procesando frame 5
Procesando frame 6
Procesando frame 7
Procesando frame 8
Procesando frame 9
Procesando frame 10
Procesando frame 11
Procesando frame 12
Procesando frame 13
Procesando frame 14
Procesando frame 15
Procesando frame 16
Actividad detectada: answering questions
Procesando frame 17
Procesando frame 18
Procesando frame 19
Procesando frame 20
Procesando frame 21
Procesando frame 22
Procesando frame 23
Procesando frame 24
Procesando frame 25
Procesando frame 26
Procesando frame 27
Procesando frame 28
Procesando frame 29
Procesando frame 30
Procesando frame 31
Procesando frame 32
Actividad detectada: answering questions
Procesando frame 33
Procesando frame 34
Procesando frame 35
Procesando frame 36
Procesando frame 37
Procesando frame 38
Procesando frame 39
Procesando frame 40
Procesando frame 41
Procesando frame 42
Procesando frame 43
Proce

In [10]:
# Paso 7: Mostrar resultados
print("\nResultados:")
for activity, duration in activity_durations.items():
    print(f"Actividad: {activity}, Duración: {duration:.2f} segundos")



Resultados:
Actividad: answering questions, Duración: 0.14 segundos
