<center>
<img src="https://laelgelcpublic.s3.sa-east-1.amazonaws.com/lael_50_years_narrow_white.png.no_years.400px_96dpi.png" width="300" alt="LAEL 50 years logo">
<h3>APPLIED LINGUISTICS GRADUATE PROGRAMME (LAEL)</h3>
</center>
<hr>

# Corpus Linguistics - Study 1 - Phase 2 - Elaine
# Image annotation proof-of-concept

## Google Cloud Video Intelligence API

Cloud Vision allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

Please refer to:

- [Google Cloud Vision API](https://cloud.google.com/vision?hl=en)
- [Google Cloud Vision API documentation](https://cloud.google.com/vision/docs)
- [Python Client for Cloud Vision](https://cloud.google.com/python/docs/reference/vision/latest)

## Required packages

The following packages are required:
- [Google Cloud Storage API](https://anaconda.org/conda-forge/google-cloud-storage)
- [Google Cloud Vision API](https://anaconda.org/conda-forge/google-cloud-vision)

## Importing the required libraries

In [1]:
import datetime
import os
import subprocess
from google.cloud import storage
from google.cloud import vision

## Label annotation

The images were renamend and uploaded to a Google Cloud Storage bucket.

- Number of images: 80 units
- Processing time: 13 min

### Uploading images to Google Cloud Storage

In [2]:
timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print(timestamp)

def upload_directory_to_bucket(bucket_name, source_directory, destination_blob_name):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)

    for root, dirs, files in os.walk(source_directory):
        for file in files:
            local_path = os.path.join(root, file)
            blob_path = os.path.join(destination_blob_name, os.path.relpath(local_path, source_directory))
            blob = bucket.blob(blob_path)
            blob.upload_from_filename(local_path)

    print(f'Directory {source_directory} uploaded to {bucket_name}/{destination_blob_name} successfully!')

end = False
while end == False:
    my_project = str(input('Enter your project name: '))
    if my_project != '':
        os.environ['GOOGLE_CLOUD_PROJECT'] = my_project
        end = True

end = False
while end == False:
    my_bucket = str(input('Enter your bucket name: '))
    if my_bucket != '':
        bucket_name = my_bucket
        end = True

source_directory = r'./images/images'
destination_blob_name = 'images'
upload_directory_to_bucket(bucket_name, source_directory, destination_blob_name)

timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print(timestamp)

2024-09-02 19:12:14


Enter your project name:  build-159206
Enter your bucket name:  laelimages


Directory ./images/images uploaded to laelimages/images successfully!
2024-09-02 19:13:43


### Label detection

In [3]:
timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print(timestamp)

def detect_labels(image_uri):
    """Detects labels in the image URL using the Google Cloud Vision API."""
    client = vision.ImageAnnotatorClient()
    image = vision.Image()
    image.source.image_uri = image_uri
    response = client.label_detection(image = image, max_results = 150)
    if response.error.message:
        raise Exception(f'Error: {response.error.message}')
    return response.label_annotations

with open('folders', 'r', encoding='utf8') as f:
    for folder in f:
        folder = folder.strip()
        os.makedirs(f'images/google_cloud/labels/{folder}', exist_ok = True)
        # os.system(f'rm -f images/google_cloud/labels/{folder}/*')

end = False
while end == False:
    my_project = str(input('Enter your project name: '))
    if my_project != '':
        os.environ['GOOGLE_CLOUD_PROJECT'] = my_project
        end = True

end = False
while end == False:
    my_bucket = str(input('Enter your bucket name: '))
    if my_bucket != '':
        bucket_name = my_bucket
        end = True

last = subprocess.run(['tail', '-1', 'images/images_index.txt'], capture_output = True, text = True).stdout.strip().split('|')[1][2:]
last = int(last)

for i in range(1, last + 1):
    try:
        with open('images/images_index.txt', 'r', encoding='utf8') as f:
            line = next(line for j, line in enumerate(f, start = 1) if j == i)
            folder = line.split('|')[0][3:]
            n = line.split('|')[1][2:]
            id = line.split('|')[2][3:]
            file = line.split('|')[5][2:]
            ext = line.split('|')[6][2:5]

        print(f"---- detect-labels {i} / {last} ----")

        image_uri = f'gs://{bucket_name}/images/{folder}/{n}.{ext}'
        labels = detect_labels(image_uri)
        with open(f'images/google_cloud/labels/{folder}/{n}.txt', 'w', encoding='utf8') as f:
            for label in labels:
                f.write('description: ' + f'{label.description}\n')
                f.write('mid: ' + f'{label.mid}\n')
                f.write('score: ' + f'{label.score}\n')
                f.write('topicality: ' + f'{label.topicality}\n\n')
    except StopIteration:
        print('The iteration was stopped because there were empty files that have been removed.')

timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print(timestamp)

2024-09-02 19:13:43


Enter your project name:  build-159206
Enter your bucket name:  laelimages


---- detect-labels 1 / 80 ----
---- detect-labels 2 / 80 ----
---- detect-labels 3 / 80 ----
---- detect-labels 4 / 80 ----
---- detect-labels 5 / 80 ----
---- detect-labels 6 / 80 ----
---- detect-labels 7 / 80 ----
---- detect-labels 8 / 80 ----
---- detect-labels 9 / 80 ----
---- detect-labels 10 / 80 ----
---- detect-labels 11 / 80 ----
---- detect-labels 12 / 80 ----
---- detect-labels 13 / 80 ----
---- detect-labels 14 / 80 ----
---- detect-labels 15 / 80 ----
---- detect-labels 16 / 80 ----
---- detect-labels 17 / 80 ----
---- detect-labels 18 / 80 ----
---- detect-labels 19 / 80 ----
---- detect-labels 20 / 80 ----
---- detect-labels 21 / 80 ----
---- detect-labels 22 / 80 ----
---- detect-labels 23 / 80 ----
---- detect-labels 24 / 80 ----
---- detect-labels 25 / 80 ----
---- detect-labels 26 / 80 ----
---- detect-labels 27 / 80 ----
---- detect-labels 28 / 80 ----
---- detect-labels 29 / 80 ----
---- detect-labels 30 / 80 ----
---- detect-labels 31 / 80 ----
---- detect-label

## Considerations

### Cost

Please refer to [Google Cloud Vision API](https://cloud.google.com/vision/pricing) and check `Label Detection`.