## Importación de librerías 

En esta sección se importan las principales librerías utilizadas en el preprocesamiento del texto.
- **spaCy**: biblioteca de procesamiento del lenguaje natural (NLP) que permite realizar tareas como tokenización, eliminación de stopwords o lematización.
    - Se incluye además la configuración personalizada del tokenizador de spaCy, que permite ajustar cómo se separan los tokens (por ejemplo, evitar divisiones innecesarias en palabras con guiones, puntos, etc.).
- **pandas**: biblioteca para la manipulación de datos estructurados (tablas) de tipo DataFrame.
- **collections.Counter**: conteo eficiente de elementos en colecciones.
- **sklearn.preprocessing.normalize**: normalización de vectores o matrices.
- **sklearn.metrics.pairwise.cosine_similarity**: cálculo de similitud coseno entre vectores.


In [26]:
# Librería que contiene herramientas de preprocesamiento
import spacy

# Herramientas internas de spaCy para modificar el tokenizador por defecto
from spacy.tokenizer import Tokenizer
from spacy.util import compile_infix_regex

# Librería para carga de datos
import pandas as pd

# Librería para contar frecuencias
from collections import Counter

# Librerías para normalizar y aplicar similitud del coseno
import sklearn
from sklearn.preprocessing import normalize
from sklearn.metrics.pairwise import cosine_similarity

## Definición de funciones y configuración de preprocesamiento

Este bloque incluye todo lo necesario para preparar el texto antes de su análisis. Se utiliza el modelo `en_core_web_lg` de spaCy y se personaliza el tokenizador para evitar divisiones en guiones.

### Componentes definidos:

- **Carga y personalización del modelo spaCy**: se modifica el tokenizador para no dividir palabras por guiones.
- **`tokenizar_texto_spacy(texto)`**: convierte una cadena de texto a minúsculas y la transforma en una lista de tokens.
- **`eliminar_stopwords_spacy(tokens)`**: elimina tokens irrelevantes, incluyendo stopwords, palabras no alfabéticas y una lista personalizada de palabras excluidas.
- **`lematizar_texto_spacy(tokens)`**: transforma cada token en su lema o raíz.
- **`aplicar_mapeo_df(serie_de_listas, mapeo)`**: aplica un diccionario de equivalencias léxicas para unificar términos similares.
- **`palabras_extra`**: lista manual de palabras irrelevantes específicas del contexto del corpus.
- **`mapeo`**: diccionario de normalización construido a partir de un archivo `.csv` externo.


In [27]:
# Cargar el modelo grande de inglés (incluye vectores de palabras)
nlp = spacy.load("en_core_web_lg")

# Eliminar los guiones como separadores internos para evitar fragmentación de tokens como "ai-based" en "ai" "based"
infixes = [x for x in nlp.Defaults.infixes if "-" not in x]
infix_re = compile_infix_regex(infixes)

# Reconfigurar el tokenizador con las nuevas reglas de infijos
nlp.tokenizer = Tokenizer(
    nlp.vocab,
    rules=nlp.Defaults.tokenizer_exceptions,
    prefix_search=nlp.tokenizer.prefix_search,
    suffix_search=nlp.tokenizer.suffix_search,
    infix_finditer=infix_re.finditer,
    token_match=nlp.tokenizer.token_match
)



# Función de tokenización básica con spaCy en la que también se pasa el texto en minúsculas
# La separación en tokens se realiza según las reglas del modelo de lenguaje cargado,
# teniendo en cuenta espacios, puntuación, signos de puntuación interna (como guiones, apóstrofes, etc.)
# y excepciones predefinidas. En este caso, se ha modificado el tokenizer para que no divida por guiones.
def tokenizar_texto_spacy(texto):
    doc = nlp(texto.lower())
    return list(doc)




# Lista personalizada de palabras a ignorar (además de las stopwords de spaCy)
palabras_extra = [
    "co", "digital", "alexa", "anytime", "cortana", "arduino", "bitcoin",
    "deepl", "digitally", "digitise", "dropbox", "duckduckgo", "e", "etc",
    "eurostat", "example", "facebook", "google", "miro", "t", "twitter",
    "tpm", "youtube", "vs", "whatsapp", "wikipedia", "x", "en", "se", "lego",
    "non", "python", "ros", "scratch", "siri", "-", "non-digital", "one-time",
    "bas", "digital-based", "digitise", "digitised", "digitization", "s", "whilst"
]

# Asegurar que "3d" y "3-d" no se consideren stopwords, ya que el eliminador de stopwrods 
# elimina todas con números
nlp.vocab["3d"].is_alpha = True
nlp.vocab["3-d"].is_alpha = True
nlp.vocab["2d"].is_alpha = True
nlp.vocab["2-d"].is_alpha = True

# Función para eliminar stopwords y tokens no alfabéticos.
# Ignora las palabras_extra
def eliminar_stopwords_spacy(tokens):
    return [
        token for token in tokens
        if not token.is_stop
        and token.is_alpha
        and token.text.lower() not in palabras_extra
    ]
    
    
    
    
# Función de lematización del modelos cargado
def lematizar_texto_spacy(tokens):
    return [token.lemma_ for token in tokens]


#carga del archivo que contiene los mapeos correspondientes y conversión a tipo DataFrame para ser tratado como tabla
ruta_mapeo = "../utils/mapping.csv"
df_mapeo = pd.read_csv(ruta_mapeo)

# Construcción del diccionario: original -> equivalente
mapeo = dict(zip(df_mapeo["original"], df_mapeo["equivalente"]))

## Función `comparar_texto_con_frecuencias`

Compara un texto con grupos de una tabla de frecuencias y calcula la similitud basada en palabras comunes.

**Parámetros:**

- `texto`: lista de palabras del texto preprocesado.
- `df_pivot`: DataFrame con palabras como filas y grupos como columnas, con frecuencias.
- `top_n`: número de palabras clave a mostrar por grupo (por defecto 5).

**Funcionamiento resumido:**

1. Obtiene el vocabulario del DataFrame.
2. Crea un vector con las frecuencias de las palabras del texto en el vocabulario.
3. Combina este vector con los vectores de frecuencias de los grupos.
4. Normaliza los vectores para calcular similitud coseno.
5. Calcula la similitud entre el texto y cada grupo.
6. Identifica las palabras que más contribuyen a la similitud para cada grupo.
7. Devuelve un DataFrame con grupo, similitud y palabras clave ordenado por similitud.

**Devuelve:**

Un DataFrame con columnas: `'Grupo'`, `'Similitud'` y `'Palabras_clave'`.

## Función `aplicar_mapeo_a_texto(lista_tokens, mapeo)`

Reemplaza cada token en una lista de estos según un diccionario de mapeo léxico para unificar términos similares.


In [28]:
def comparar_texto_con_frecuencias(texto, df_pivot, top_n=5):

    # 1. Vocabulario (palabras que aparecen en df_pivot)
    vocabulario = df_pivot.index.tolist()

    # 2. Vector del texto: contar solo palabras del vocabulario
    conteo_texto = Counter(p for p in texto if p in vocabulario)
    vector_texto = pd.Series([conteo_texto.get(p, 0) for p in vocabulario], index=vocabulario).to_frame(name="Texto")

    # 3. Concatenar vectores del texto y de grupos
    matriz_completa = pd.concat([df_pivot, vector_texto], axis=1).fillna(0)

    # 4. Normalizar vectores para cálculo coseno
    matriz_normalizada = normalize(matriz_completa.T)

    # 5. Calcular similitud coseno entre texto (última fila) y cada grupo (todas menos última)
    similitudes = cosine_similarity([matriz_normalizada[-1]], matriz_normalizada[:-1])[0]

    # 6. Preparar resultados básicos
    nombres_grupo = df_pivot.columns.tolist()
    resultados = pd.DataFrame({
        'Grupo': [g if isinstance(g, str) else ' - '.join(map(str, g)) for g in nombres_grupo],
        'Similitud': similitudes
    })

    # 7. Calcular contribuciones por palabra a la similitud de cada grupo
    #    La contribución para cada palabra = frecuencia_texto * frecuencia_grupo
    #    Ordenamos y extraemos las top_n palabras para cada grupo
    palabras_clave_por_grupo = []
    for grupo in nombres_grupo:
        contribuciones = []
        for palabra in vocabulario:
            f_texto = vector_texto.at[palabra, "Texto"]
            f_grupo = df_pivot.at[palabra, grupo]
            if f_texto > 0 and f_grupo > 0:
                contribuciones.append((palabra, f_texto * f_grupo))

        # Ordenar por contribución descendente
        contribuciones.sort(key=lambda x: x[1], reverse=True)
        top_palabras = [p for p, _ in contribuciones[:top_n]]
        palabras_clave_por_grupo.append(", ".join(top_palabras))

    resultados["Palabras_clave"] = palabras_clave_por_grupo

    # 8. Ordenar resultados por similitud descendente
    resultados = resultados.sort_values(by="Similitud", ascending=False).reset_index(drop=True)

    return resultados

def aplicar_mapeo_a_texto(lista_tokens, mapeo):
    return [mapeo.get(token, token) for token in lista_tokens]

## Bloque para modificar el texto a analizar

En este bloque el usuario debe introducir el texto que desea analizar. El texto pasará por varias etapas de preprocesamiento: tokenización, eliminación de stopwords, lematización y aplicación del mapeo léxico definido.


In [29]:
# TEXTO A COMPARAR
#texto = "Managing Files and Applications. Introducing File Management.  Identify common examples of applications like: office productivity, web browser, communications, social networking, design. Understand the function of the operating system's file management application, desktop, and taskbar to efficiently manage and access files, folders, applications. Identify common icons like those representing: files, folders, applications, printers, drives, shortcuts/aliases, recycle bin/wastebasket/trash. Identify common file types like: word processing, spreadsheet, presentation, portable document format (pdf), image, audio, video, compressed, executable files.Understand how an operating system organises drives, folders, files in a hierarchical structure. Navigate between drives, folders, sub-folders, files. Change view mode to display files and folders like: tiles, icons, list, details. Search for files by properties: all or part of file name using wildcards if necessary, content, date modified. Create a folder. Recognise good practice in folder, file naming: use meaningful names for folders and files to help with searching and organisation. Organising Files and Folders Rename a file, folder. Select individual, adjacent, non-adjacent files, folders. Copy, move files, folders between folders, drives. Delete files, folders to the recycle bin/wastebasket/trash and restore to original location Sort files in ascending, descending order by name, size, type, date modified. Storage Identify the main types of storage media like: internal hard drive, external hard drive, network drive, online/cloud file storage, USB flash drive, memory card Identify file size, folder size, storage capacity measurements like: KB, MB, GB, TB, PB. Display file, folder, drive properties like: name, size, location. Managing Applications Install, uninstall an application Shut down a non-responding application Capture a full screen, active window Networks Network Concepts Define the term network. Outline the purpose of a network: to share, access data, applications and devices securely. Understand the concepts of downloading from, uploading to a network. Understand the term Internet. Identify some of its main uses like: information searching, communication, purchasing, selling, learning, publishing, banking, government services, entertainment, software access, file storage. Network Access Identify options for connecting to the Internet like: wired network, wireless network, mobile phone network. Recognise the status of a wireless network: protected/secure, open. Connect to, disconnect from a wireless network Online Information Finding Information Understand the terms: World Wide Web (WWW), Uniform Resource Locator (URL), hyperlink Understand the function of search engines and identify some common examples Carry out a search using a keyword, phrase, exact phrase, image Refine a search using advanced search features like: date, language, media type, usage rights Managing Information Create, delete a bookmarks / favourites folder. Add web pages to a bookmarks / favourites folder. Download, save files to a location Preview, print a web page, selection from a web page using available printing options. Define the terms copyright, intellectual property. Recognise the need to acknowledge sources and/or seek permission as appropriate Web Browser Settings Set the web browser home page Understand the term pop-up. Allow, block pop-ups. Understand the term cookie. Allow, block cookies. Delete history, temporary Internet files, saved form data, saved passwords Online Communication Communication Tools Understand the function and features of email, and identify some common examples. Understand the structure of an email address Understand the function and features of messaging, audio call, video call tools, and identify some common examples Understand the function and features of social networking sites, forums, and identify some common examples. Recognise good practice when using communication tools like: use an appropriate communication tool and tone for the audience and content; be accurate, brief, clear; do not inappropriately disclose private or sensitive information; do not circulate inappropriate content; use in accordance with usage policies. Sending Email Create an email. Enter an appropriate title in the subject field and enter, paste content into the body of an email. Enter one or more email addresses, distribution list in the To, Copy (Cc), Blind copy (Bcc) fields, and identify when these should be used. Add, remove a file attachment Send an email. Receiving Email Open, close an email Use the reply, reply to all function, and identify when these should be used. Forward an email. Open, save a file attachment to a location. Email Tools and Settings Recognise options for setting an out of office reply. Mark an email as read, unread. Flag, unflag an email. Create, delete, update a contact, distribution list / mailing list. Organising Emails Search for an email by sender, subject, email content. Sort emails by name, date, size. Create, delete an email folder/label. Move emails to an email folder/label. Delete an email. Restore a deleted email. Move a message to, remove a message from a junk folder. Using Calendars Create, cancel, update a meeting in a calendar. Add invitees, resources (meeting room, equipment) to a meeting in a calendar. Remove invitees, resources from a meeting in a calendar. Accept, decline an invitation. Safety Computers, Devices and Data Understand some potential threats to computers, devices and data like: malware, unauthorised access, theft, accidental damage. Recognise some ways to protect computers, devices and data like: use anti-virus software; regularly update anti-virus, application and operating system software; do not download programs, open attachments, links from unknown sources; use encryption; use strong passwords; regularly back up data to a remote location. Recognise some ways to protect personal and organisational data when online like: identify a secure website; purchase from secure reputable websites; avoid unnecessary disclosure of private, sensitive and financial information; log off from websites; be aware of the possibility of fraudulent and unsolicited communications. Use anti-virus software to scan a computer or device.Well Being and Accessibility Recognise ways to help ensure a user’s well- being while using a computer or device like: take regular breaks, ensure appropriate lighting, posture and headphone volume Identify some options available for enhancing accessibility like: voice recognition software, screen reader, screen magnifier, on-screen keyboard, high contrast. Environment Recognise computer and device energy saving practices like: turning off, adjusting display and power mode settings, disabling services when not required. Recognise that computers, devices, equipment, batteries, printer cartridges and paper should be recycled where possible."
texto = "Security Concepts Data Threats Distinguish between data and information. Understand the terms cybercrime, hacking. Recognise malicious, accidental threats to data from individuals, service providers, external organisations. Recognise threats to data from extraordinary circumstances like: fire, floods, war, earthquake. Recognise threats to data from using cloud computing like: data control, potential loss of privacy. Understand basic characteristics of information security like: confidentiality, integrity, availability. Value of Information Understand the reasons for protecting personal information like: avoiding identity theft, fraud, maintaining privacy. Understand the reasons for protecting workplace information on computers and devices like: preventing theft, fraudulent use, accidental data loss, sabotage. Identify common data/privacy protection, retention and control principles like: transparency, legitimate purposes, proportionality. Understand the terms data subjects and data controllers and how data/privacy protection, retention and control principles apply to them. Understand the importance of adhering to guidelines and policies for ICT use and how to access them. Personal Security Understand the term social engineering and its implications like: unauthorised computer and device access, unauthorised information gathering, fraud. Identify methods of social engineering like: phone calls, phishing, shoulder surfing. Understand the term identity theft and its implications: personal, financial, business, legal.Identify methods of identity theft like: information diving, skimming, pretexting. Understand the effect of enabling/disabling macro security settings. File Security Understand the advantages, limitations of encryption. Be aware of the importance of not disclosing or losing the encryption password, key, certificate. Encrypt a file, folder, drive. Set a password for files like: documents, spreadsheets, compressed files. Malware Types and Methods Understand the term malware. Recognise different ways that malware can be concealed on computers and devices like: Trojans, rootkits, backdoors. Recognise types of infectious malware and understand how they work like: viruses, worms. Recognise types of data theft, profit generating/extortion malware and understand how they work like: adware, ransomware, spyware, botnets, keystroke logging, diallers. Protection Understand how anti-virus software works and its limitations. Understand that anti-virus software should be installed on computers and devices. Understand the importance of regularly updating software like: anti-virus, web browser, plug-in, application, operating system. Scan specific drives, folders, files using anti-virus software. Schedule scans using anti-virus software. Understand the risks of using obsolete and unsupported software like: increased malware threats, incompatibility. Resolving and Removing Understand the term quarantine and the effect of quarantining infected/suspicious files. Quarantine, delete infected/suspicious files. Understand that a malware attack can be diagnosed and resolved using online resources like: websites of operating system, anti-virus, web browser software providers, websites of relevant authorities. Network Security Networks and Connections Understand the term network and recognise the common network types like: local area network (LAN), wireless local area network (WLAN), wide area network (WAN), virtual private network (VPN). Understand how connecting to a network has implications for security like: malware, unauthorised data access, maintaining privacy. Understand the role of the network administrator in managing authentication, authorisation and accounting, monitoring and installing relevant security patches and updates, monitoring network traffic, and in dealing with malware found within a network. Understand the function, limitations of a firewall in personal, work environment. Turn a personal firewall on, off. Allow, block an application, service/feature access through a personal firewall. Wireless Security Recognise different options for wireless security and their limitations like: Wired Equivalent Privacy (WEP), Wi-Fi Protected Access (WPA) / Wi-Fi Protected Access 2 (WPA2), Media Access Control (MAC) filtering, Service Set Identifier (SSID) hiding. Understand that using an unprotected wireless network can lead to attacks like: eavesdroppers, network hijacking, man in the middle. Understand the term personal hotspot. Enable, disable a secure personal hotspot, and securely connect, disconnect devices. Access Control Methods Identify measures for preventing unauthorised access to data like: user name, password, PIN, encryption, multi- factor authentication Understand the term one-time password and its typical use. Understand the purpose of a network account. Understand that a network account should be accessed through a user name and password and locked, logged off when not in use. Identify common biometric security techniques used in access control like: fingerprint, eye scanning, face recognition, hand geometry. Recognise good password policies, like: adequate password length, adequate letter, number and special characters mix, not sharing passwords, changing them regularly, different passwords for different services. Password Management Understand the function, limitations of password manager software. Secure Web Use Browser Settings Select appropriate settings for enabling, disabling autocomplete, autosave when completing a form. Delete private data from a browser like: browsing history, download history, cached Internet files, passwords, cookies, autocomplete data. Secure Browsing Be aware that certain online activity (purchasing, banking) should only be undertaken on secure web pages using a secure network connection. Identify ways to confirm the authenticity of a website like: content quality, currency, valid URL, company or owner information, contact information, security certificate, validating domain owner. Understand the term pharming Understand the function and types of content-control software like: Internet filtering software, parental control software. Communications E-Mail Understand the purpose of encrypting, decrypting an e-mail. Understand the term digital signature. Identify possible fraudulent e-mail, unsolicited e-mail. Identify common characteristics of phishing like: using names of legitimate organisations, people, false web links, logos and branding, encouraging disclosure of personal information. Be aware that you can report phishing attempts to the legitimate organisation, relevant authorities. Be aware of the danger of infecting a computer or device with malware by opening an e-mail attachment that contains a macro or an executable file. Social Networking Understand the importance of not disclosing confidential or personal identifiable information on social networking sites. Be aware of the need to apply and regularly review appropriate social networking account settings like: account privacy, location. Apply social networking account settings: account privacy, location. Understand potential dangers when using social networking sites like: cyber bullying, grooming, malicious disclosure of personal content, false identities, fraudulent or malicious links, content, messages. Be aware that you can report inappropriate social network use or behaviour to the service provider, relevant authorities. VoIP and Instant Messaging Understand the security vulnerabilities of instant messaging (IM) and Voice over IP (VoIP) like: malware, backdoor access, access to files, eavesdropping Recognise methods of ensuring confidentiality while using IM and VoIP like: encryption, non-disclosure of important information, restricting file sharing. Mobile Understand the possible implications of using applications from unofficial application stores like: mobile malware, unnecessary resource utilisation, access to personal data, poor quality, hidden costs Understand the term application permissions. Be aware that mobile applications can extract private information from the mobile device like: contact details, location history, images. Be aware of emergency and precautionary measures if a device is lost like: remote disable, remote wipe, locate device. Secure Data Management Secure and Back up Data Recognise ways of ensuring physical security of computers and devices like: do not leave unattended, log equipment location and details, use cable locks, access control. Recognise the importance of having a backup procedure in case of loss of data from computers and devices.Identify the features of a backup procedure like: regularity/frequency, schedule, storage location, data compression. Back up data to a location like: local drive, external drive/media, cloud service. Restore data from a backup location like: local drive, external drive/media, cloud service. Secure Deletion and Destruction Distinguish between deleting and permanently deleting data. Understand the reasons for permanently deleting data from drives or devices. Be aware that content deletion may not be permanent on services like: social network site, blog, Internet forum, cloud service. Identify common methods of permanently deleting data like: shredding, drive/media destruction, degaussing, using data destruction utilities."


texto1 = tokenizar_texto_spacy(texto)
texto2 = eliminar_stopwords_spacy(texto1)
texto3 = lematizar_texto_spacy(texto2)
texto4 = aplicar_mapeo_a_texto(texto3, mapeo)

pd.set_option('display.max_rows', None)  # Muestra todas las filas
pd.set_option('display.max_columns', None)  # Muestra todas las columnas
pd.set_option('display.max_colwidth', None)  # Muestra todo el contenido de las celdas, sin truncar


## Comparación de texto seleccionado con texto de DigComp

### Dimensión 2 agrupado por áreas

In [30]:
ruta_d2_areas = "../results/digcomp/frecuencias_d2_areas.csv"
frecuencias_2_areas = pd.read_csv(ruta_d2_areas, index_col=0)

resultados_d2_areas = comparar_texto_con_frecuencias(texto4, frecuencias_2_areas, top_n=5)
print(resultados_d2_areas)

  Grupo  Similitud                                   Palabras_clave
0     4   0.552131       datum, personal, device, security, protect
1     1   0.453257  datum, information, access, awareness, identify
2     2   0.343424           datum, use, service, awareness, access
3     5   0.268744   understand, use, device, information, identify
4     3   0.180551           datum, access, use, content, awareness


### Dimensión 2 agrupado por competencias

In [31]:
ruta_d2_competencias = "../results/digcomp/frecuencias_d2_competencia.csv"
frecuencias_2_competencias = pd.read_csv(ruta_d2_competencias, index_col=0)

resultados_d2_competencias = comparar_texto_con_frecuencias(texto4, frecuencias_2_competencias, top_n=5)
print(resultados_d2_competencias)

   Grupo  Similitud                                   Palabras_clave
0    4.1   0.481844      datum, security, password, access, personal
1    4.2   0.422360       datum, personal, protect, service, privacy
2    1.3   0.379916    datum, use, application, organisation, online
3    2.6   0.367205          datum, personal, service, use, identity
4    1.1   0.301235   information, datum, access, personal, identify
5    1.2   0.286445  information, datum, awareness, identify, social
6    5.4   0.246663          understand, information, use, awareness
7    5.2   0.234608      access, recognise, use, awareness, identify
8    4.4   0.200890       device, datum, use, awareness, environment
9    4.3   0.187791          awareness, social, use, device, protect
10   2.5   0.149327          awareness, social, differ, work, manage
11   2.1   0.148114            social, awareness, use, service, work
12   3.1   0.126783            access, content, use, document, image
13   5.1   0.115824               

### Dimensión 3 agrupado por competencias

In [32]:
ruta_d3_competencias = "../results/digcomp/frecuencias_d3_competencia.csv"
frecuencias_3_competencias = pd.read_csv(ruta_d3_competencias, index_col=0)

resultados_d3_competencias = comparar_texto_con_frecuencias(texto4, frecuencias_3_competencias, top_n=5)
print(resultados_d3_competencias)

   Grupo  Similitud                                      Palabras_clave
0    4.2   0.540929              datum, personal, protect, privacy, way
1    1.1   0.457253                         datum, information, content
2    1.2   0.423454                         datum, information, content
3    4.1   0.392574          security, device, protect, privacy, threat
4    1.3   0.371908  datum, information, content, organisation, storage
5    3.3   0.342571                  datum, information, content, apply
6    2.6   0.338909              datum, protect, service, identity, way
7    4.3   0.201368                social, protect, threat, way, danger
8    2.2   0.171155            information, content, share, appropriate
9    4.4   0.133275                           use, protect, environment
10   5.1   0.114219             device, operate, technique, environment
11   3.2   0.099242                                information, content
12   5.2   0.097431            personal, need, possibility, envi

### Dimensión 3 agrupado por nivel de competencia

In [33]:
ruta_d3_tipoNivel = "../results/digcomp/frecuencias_d3_tipoNivel.csv"
frecuencias_3_tipoNivel = pd.read_csv(ruta_d3_tipoNivel, index_col=0)

resultados_d3_tipoNivel = comparar_texto_con_frecuencias(texto4, frecuencias_3_tipoNivel, top_n=5)
print(resultados_d3_tipoNivel)

                Grupo  Similitud  \
0            Advanced   0.480306   
1        Intermediate   0.411430   
2          Foundation   0.353286   
3  Highly specialised   0.004311   

                                     Palabras_clave  
0      datum, information, differ, content, protect  
1    datum, information, personal, content, protect  
2  datum, identify, information, recognise, content  
3                                            factor  


### Dimensión 3 agrupado por subnivel de competencias

In [34]:
ruta_d3_subnivel = "../results/digcomp/frecuencias_d3_subnivel.csv"
frecuencias_3_subnivel = pd.read_csv(ruta_d3_subnivel, index_col=0)

resultados_d3_subnivel = comparar_texto_con_frecuencias(texto4, frecuencias_3_subnivel, top_n=5)
print(resultados_d3_subnivel)

  Grupo  Similitud                                    Palabras_clave
0     4   0.509210    datum, information, personal, content, protect
1     5   0.453723        datum, information, differ, apply, content
2     2   0.348774  datum, identify, information, recognise, content
3     1   0.348774  datum, identify, information, recognise, content
4     6   0.304702     datum, information, appropriate, content, way
5     3   0.230689     datum, information, content, way, environment
6     7   0.038483                                      limit, guide
7     8   0.006838                                            factor


### Dimensión 4 agrupado por competencias

In [35]:
ruta_d4_competencias = "../results/digcomp/frecuencias_d4_competencia.csv"
frecuencias_4_competencias = pd.read_csv(ruta_d4_competencias, index_col=0)

resultados_d4_competencias = comparar_texto_con_frecuencias(texto4, frecuencias_4_competencias, top_n=5)
print(resultados_d4_competencias)

   Grupo  Similitud                                    Palabras_clave
0    4.1   0.571562        datum, security, network, password, access
1    4.2   0.409516      datum, security, personal, service, identity
2    1.3   0.354174                                     datum, online
3    2.6   0.351850           datum, personal, service, use, identity
4    1.2   0.257865  information, datum, recognise, awareness, social
5    1.1   0.224953    information, datum, access, awareness, content
6    5.2   0.211209        recognise, access, awareness, service, use
7    4.4   0.188439        device, datum, awareness, use, environment
8    4.3   0.154057       awareness, use, device, application, online
9    2.5   0.146698        awareness, social, differ, manage, message
10   5.4   0.143825               understand, awareness, online, open
11   3.1   0.139977                    access, content, use, document
12   2.1   0.128796      social, awareness, service, use, communicate
13   3.3   0.113105 

### Dimensión 4 agrupado por tipo de ejemplos

In [36]:
ruta_d4_tipoEjemplo = "../results/digcomp/frecuencias_d4_tipoEjemplo.csv"
frecuencias_4_tipoEjemplo = pd.read_csv(ruta_d4_tipoEjemplo, index_col=0)

resultados_d4_tipoEjemplo = comparar_texto_con_frecuencias(texto4, frecuencias_4_tipoEjemplo, top_n=5)
print(resultados_d4_tipoEjemplo)

       Grupo  Similitud                               Palabras_clave
0  Knowledge   0.628410  datum, awareness, access, personal, service
1  Attitudes   0.489680    datum, information, personal, access, use
2     Skills   0.436342      datum, use, information, access, device
