# MSCA ML Final Project: Face-to-BMI
# Part 1: Preparations

Deliverables:

1-You must create a simple web API to predict a user's BMI in real-time. You can also use webcam input. The goal is to use one of the pre-trained image models (e.g. VGG Face), fine-tune with the provided data, and deploy via jupyter notebook, streamlit, flask or any other simple restful api's.

2-10 pages of the write-up about your implementation.

3-10 mins presentation or live demo in the final lecture.

Our goal is to beat the performance metrics provided in the paper.

In [None]:
''' #colab
import os
# Load the Drive helper and mount
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive/')
path_gdrive = '/content/drive/MyDrive/Colab Datasets/ML/BMI'
os.chdir(path_gdrive)
print(os.getcwd())'''

In [7]:
import os

# Google Bucket
bucket_path = 'gs://msca-sp23-bucket/ml_data'
file = 'BMI-20230313T174553Z-001.zip'
runtime_path = '/home/jupyter/data/ml/BMI'

os.chdir(runtime_path)
print(os.getcwd())

/home/jupyter/data/ml/BMI


In [8]:
import os
import pandas as pd

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


In [9]:
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 500)

In [10]:
# read csv file
data_path = '/home/jupyter/data/ml/BMI/data.csv'
bmi = pd.read_csv(data_path, index_col=0)

In [11]:
# check csv
bmi.head()

Unnamed: 0,bmi,gender,is_training,name
0,34.207396,Male,1,img_0.bmp
1,26.45372,Male,1,img_1.bmp
2,34.967561,Female,1,img_2.bmp
3,22.044766,Female,1,img_3.bmp
4,37.758789,Female,1,img_4.bmp


In [12]:
bmi.is_training.value_counts()

is_training
1    3368
0     838
Name: count, dtype: int64

In [13]:
bmi['imgae_type'] = bmi['name'].str.extract(r'\.(\w+)$')
bmi['imgae_type'].value_counts()

imgae_type
bmp    4206
Name: count, dtype: int64

In [17]:
# code reference to https://github.com/abhaymise/Face-to-height-weight-BMI-estimation-/blob/master/BMI%20prediction.ipynb

profile_df = bmi.copy()

# map image to obvervation in bmi table
data_folder = "/home/jupyter/data/ml/BMI/Images"

from glob import glob

all_files = glob(data_folder+"/*")
all_imgs = sorted([img for img in all_files if ".bmp" in img])
print("Total {} photos ".format(len(all_imgs)))

Total 3963 photos 


In [18]:
def get_index_of_digit(string):
    import re
    digits = re.findall('\d+', string)
    return digits[0]

In [23]:
from pathlib import Path as p

id_path = [(get_index_of_digit(images),images) for  images in all_imgs]
image_df = pd.DataFrame(id_path,columns=['id','path'])

In [26]:
# reset iddtype
image_df['id'] = image_df['id'].astype(int)
image_df = image_df.sort_values(by='id')
image_df.head()

Unnamed: 0,id,path
0,0,/home/jupyter/data/ml/BMI/Images/img_0.bmp
1,1,/home/jupyter/data/ml/BMI/Images/img_1.bmp
1067,2,/home/jupyter/data/ml/BMI/Images/img_2.bmp
2117,3,/home/jupyter/data/ml/BMI/Images/img_3.bmp
3531,6,/home/jupyter/data/ml/BMI/Images/img_6.bmp


In [22]:
len(id_path)

3963

In [28]:
# extract id from bmi
bmi['id'] = bmi['name'].str.extract(r'(\d+)')
bmi['id'] = bmi['id'].astype(int)

# merge bmi and image
all_data = pd.merge(bmi, image_df, on='id', how='inner')
all_data.head()

Unnamed: 0,bmi,gender,is_training,name,imgae_type,id,path
0,34.207396,Male,1,img_0.bmp,bmp,0,/home/jupyter/data/ml/BMI/Images/img_0.bmp
1,26.45372,Male,1,img_1.bmp,bmp,1,/home/jupyter/data/ml/BMI/Images/img_1.bmp
2,34.967561,Female,1,img_2.bmp,bmp,2,/home/jupyter/data/ml/BMI/Images/img_2.bmp
3,22.044766,Female,1,img_3.bmp,bmp,3,/home/jupyter/data/ml/BMI/Images/img_3.bmp
4,25.845588,Female,1,img_6.bmp,bmp,6,/home/jupyter/data/ml/BMI/Images/img_6.bmp


In [29]:
# export df with file_path
all_data.to_csv(runtime_path+'/all_data.csv', index=False)