![](images/TangramMakerLogo.png)

# <center> Movie Review Rating Transform Toy Project </center>

### Background for this project

[Tangram Flex](https://www.tangramflex.com) produces a specification language called "Flex".  Flex allows you to define interface standards/interface control documents (ICDs) that can directly be linked to code.  In addtion, Flex allows you to define transforms between different types of interfaces and these transforms are also linked to code. This code is typically used in embedded systems to provide messaging apis between software components.  You can read about the benefits of Flex [here](https://assets.website-files.com/5f203955af96993dab25b732/622f4f4b5271725653b903ae_WhyFlex_PublicWhitepaperv1.pdf) 

When people are learning to train AI models, one of the first things they learn is how to leverage Natural Language Processing (NLP) to train an AI model that can accurately predict sentiment in free text.  The often used dataset to learn to train a sentiment analysis model is the [imdb dataset](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews) of 50,000 free text movie reviews each labeled with a "Positive" or a "Negative" label.  AI Practitioners use the free text and the labels to train a model.  How that is done is outside the scope of this project but for those interested I would point to the [fastai course lesson on sentiment analysis](https://www.github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-imdb.ipynb) Note: the fastai course gives recommendations on how to leverage a GPU which you would need to train this model.  Also, using the model is outside the scope as well.  However, here is a [link](https://www.github.com/jmstadt/flask-movie-reviews) that describes how I hosted a trained model to take in free text and return the sentiment for movies leveraging the render hosting site.

Kaggle also produces a [movie reveiw dataset](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset).  This is a tabular dataset and the label ratings go from 1 to 5.  AI experimenters often want to leverage different datasets in projects.  I.e. leveraging the kaggle dataset data to improve the IMDB model.  In order to do this, the practitioner needs to write a transform from the Kaggle 1 to 5 labels to the IMDB Positive/Negative labels.

Typically, an AI Practitioner would hand code a transformation.  However this can become problematic.  In particular for Kaggle to IMDB, most people would transform the 1 and 2 labels to negative and the 4 and 5 labels to positive.  But, what about the 3 label?  Different coders might make different choices that can result in different behavior in trained AI models.  In addition, you would have to review their code to try and see what they did.  

But, what if there was a documented standard way of performing this transform?  And rather than reviewing this standard, you could call that standard from code.  Providing this is the purpose of this toy project.  I will create a standard for this transform in Flex.  I will then leverage Tangram Maker to generate code for this transform.  Then, within this notebook, I will call this code to transform a Kaggle Rating Label to an IMDB Rating Label that is peformed per the standard.

**So let's get started!  The following steps will walk you through**

### Get a Free Tangram Maker Account so you can author your Flex standard

Sign up for a free Tangram Maker account [here](https://www.tangramflex.com/tangram-maker.  Once you have your login information, you can access Tangram Maker at https://maker.tangramflex.io.

<img src="images/TangramMakerLogin.png" width ="400"/>

Once logged in, you can review docs and tutorials at https://docs.tangramflex.io/docs/tutorials/what_is_tangrampro.  Note:  Tangram Maker is the free version of Tangram Pro and the tutorials for Tangram Maker apply to Tangram Pro.

### Author The Kaggle Rating to IMDB Rating Standard Transform in Flex

The required flex interfaces and transform code is provided in this repositories **SupportingCode folder**.  You can use this code by copying and pasting into the Tangram Maker Flex Editor and the Author a Message Set with Flex Tutorial here: https://docs.tangramflex.io/docs/tutorials/flex_authorship.  For more details on authoring flex go to: https://docs.tangramflex.io/docs/flex/start.  Note: You can view or edit a .flex file in any text editor.

The resulting directory structure should like as follows in Tangram Maker.  Note:  Tangram Maker automatically increments published message packages when you edit them.  I edited my first Kaggle interface definition twice so I am at version v3.  And at version v2 for IMDB.  Note your versions as this is used in the Transform as well as the resulting generated code

<img src="images/MakerMoviesFlexDirectoryStructure.png" width="400"/>

The Transform and the Interfaces will look as follows:

<img src="images/MakerFlexOneToFiveReview.png" width="200"/>
<img src="images/MakerFlexPosNegReview.png" width="200"/>
<img src="images/MakerFlexKaggleToIMDBTransform.png" width="200"/>

One of the great things in Tangram Maker is that it allows you to visualize your transform as a model.  Most people want to model capabilities these days and Tangram Maker allows you to visualize your flex code.  It is really easy to do in Tangram Maker.  Lets see if you can do it.  For hints you can go to https://docs.tangramflex.io/docs/tutorials/visualize_a_Component_based_system_design.  When you are done, the model should look something like the below.  Note, if you don't get there, see the next step and it will show you how.

<img src="images/MakerKaggleToIMDBTransformDesign.png" width="400"/>

### Generate Usable Code from the Standard Transform you created

OK, great.  We have created our standard, now let's put it to use.  Tangram Maker allows you to generate code from a model.  On the following link https://docs.tangramflex.io/docs/tutorials/implement_transform.  Go through Step 1, you may have already did that.  Then Step 2.  Then skip Step 3.  Then only do the second part of Step 4.  That will get you output code to download. You should see the following when you set up your workflow and your code gen in Maker.

<img src="images/MakerWorkflow.png" width="200"/>
<img src="images/MakerCodeGen.png" width="200"/>

When you download your code the directory structure should look as follows:

<img src="images/MakerDownloadStructure.png" width="200"/>

If, unlike me, you are a good C++ programmer.  You are now good to go.  You have all the getters and setters needed to leverage the standard you created.

But, if you are like me, it would sure be nice to use this new standard in python in a Jupyter notebook, so let's do a little more so we can do that

### Prep the output code to use in a Jupyter Notebook

The first thing to note is that the code as is is designed to run on Linux.  If you have Linux, you can skip this.  But, I have a Mac, so I set up a VirtualBox.  How to set up VirtualBox has a variety of approaches.  But, you can follow the steps in the following tutorial to do that:  https://docs.tangramflex.io/docs/tutorials/tutorials/implement_csi

Once you have that set up.  You can install a jupyter notebook with "sudo pip install jupyter".  And then you can run the notebook with "run jupyter notebook".

Just a couple more things now.

First, at the time of writing this project there is a bug in Maker in the variables.mk file in the Transforms folder.  For the input and output CSI if you see "Unknown" or "Unknown Folder", change those to v3 and v2 respectively so it looks like the below.  If the folders have a directory, good news, we fixed the bug:)

<img src="images/variables_mk_edit.png" width="400"/>

Ok, next the python package I am using is ctypes.  Ctypes takes c, not c++.  So I just have to wrap the getters and setters in an extern C helper function.  This is included in the SupportingCode folder.  Also, the associated Makefile is there as well.  But, that helper c++ program in the Transform folder and replace the generated Makefile in the Transform folder with the provided Makefile

Finally do the following: 1) In order to use the external c, go into the v3 folder in a terminal and first run "make clean" and then "make ffi=y".  Do the same in the v2 folder.  Then go to the Transform folder and run "make".  When it compiles you should have a new "libmoviesmsg.so" in that folder.  If you do, you are good to go for the next step.

### Use the ctypes package and the libmoviesmsg.so that you generated to use your standard that you created in code in a jupyter-notebook

If you are not familiar with a jupyter-notebook.  Click on a below cell and either edit or hit shift-enter and the below code will execute.

In [1]:
# import the ctypes package in order to read the Tangram Maker 
# generated c/c++ transform standard represented in a shared object
import numpy.ctypeslib as ctl
import ctypes

In [2]:
# check your working directory and align editing directories in below cells accordingly
%pwd

'/media/sf_toy_movies'

In [3]:
# assign directories
movie_libname = 'libmoviesmsg.so'
movie_libdir = './movies_out_24_march_2022/transform'
movie_lib=ctl.load_library(movie_libname, movie_libdir)

In [4]:
# declare functions from the shared object
# the Kaggle Review Rating Function
py_int_movie_msg = movie_lib.create_and_populate_kaggle_msg
# the kaggle Review Rating Argument Type
py_int_movie_msg.argtypes = [ctypes.c_int]
# the create placeholder IMDB Message Function
py_imdb_msg = movie_lib.create_imdb_msg
# the transform function from the kaggle review to the IMDB review
my_try = movie_lib.transform_kaggle2imdb
# the return kaggle movie rating transformed to the IMDB review rating function
my_imdb_review = movie_lib.return_imdb_review
# the returned imdb review result type
my_imdb_review.restype = ctypes.c_char_p

In [5]:
def kaggle2imdb(kaggle_review):
    my_input = py_int_movie_msg(kaggle_review)
    my_output = py_imdb_msg()
    my_transform_success = my_try(my_input, my_output)
    my_transform_output = my_imdb_review(my_output)
    #ctypes returns bytes that needs to be decoded into a python string
    my_workable_output = my_transform_output.decode("utf-8")
    return my_workable_output

In [6]:
#input a moview review rating (Kaggle ratings are 1:5)
review = 5

In [7]:
#transform the movie review rating from Kaggle to Imdb and return the result
kaggle2imdb(review)

'Positive'

In [8]:
#there are functions to free the memory that are provided that can also 
#be called and should be done for projects that are not toys

### That is it.  Well done.  While this toy project may seem trivial.  Think about it.  You created a standard.  You modeled a standard.  You generated usable code to implement that standard and you then used that standard to get a job done.  You could publish this standard and either provide Maker accounts or the resulting code so that a community can then use that standard as opposed to everyone interpreting that standard in their code as they see fit.  Now everyone is on the same page.