In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Recommender System with TFX pipelines
The notebook builds MLOps components and pipelines using TFX for the recommender system [here.](https://www.kaggle.com/code/nicholeasuniquename/recommender-systems/)

1. Create a virtual environment for TFX compatability
2. Build the TFX components and upload to the public repository.
3. Download the components and test them in the virtual environment here.
4. Build the MLOps pipelines, upload to public repository.
5. Download the pipelines and test in the virtual environment here.


## 1. Creating the TFX compatible virtual environment

tfx version 1.16.0 is latest stable as of Sep 28, 2025

It is compatible with python 3.9 and 3.10 only.

The current kaggle python docker image uses python 3.11.13.

To use an earlier version of python on Kaggle, one can install conda and create a virtual environment that is based on an earlier version of python. 

Once conda is installed and a virtual environment is created for the earlier version of python, the virtual environment can be activated by activating conda and then activating the virtual environment.

A bash shell in the notebook that is invoked from the magic command %%bash is a bash session for the extent of that specific cell.
For each new session invoked by the cell %%bash, the 2 activation commands need to be invoked before using the virtual environment.

Aside from running scripts in the magic bash shell cells, we can also run scripts using the python subprocess library as long as we prepend commands with the 2 conda activation statements (see details in the definition for the run_command below).

We have 2 ways to run commands within the virtual environment.

The notebook itself is still using the kaggle docker image environment without the newly built virtual environment.
Even if we install and use ipykernel to register a kernel for the new virtual environment, I don't see a way to open the notebook to use the new kernel.  (In the Kaggle window, we have Session options, persistence option to persist files and variables, so it might be possible to restart the notebook with kernel selected as long as the kernel has Kaggle specific notebook support...)

In summary, the notebook as is can be used for intermediate steps of EDA where the EDA uses libraries that don't require an earlier version of python.  For MLOps steps that need an earlier version of python, the virtual environment is available.


In [2]:
!pwd
!echo $HOME

/kaggle/working
/root


%%bash: Executes the entire cell as a shell script. 

In [3]:
%%bash
t0=$(date +%s%N)
mkdir -p ~/miniconda3
wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
#install conda and activate to /usr/local
bash ~/miniconda3/miniconda.sh -b -u -p /usr/local
rm ~/miniconda3/miniconda.sh
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

. /usr/local/bin/activate
echo "**$SHELL**"
echo "**$BASH**"
conda init --all

. /root/.bashrc
conda create -q --name my_tfx_env python=3.10 -y
conda activate my_tfx_env
python --version

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date

PREFIX=/usr/local
Unpacking bootstrapper...
Unpacking payload...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /usr/local
accepted Terms of Service for https://repo.anaconda.com/pkgs/main
accepted Terms of Service for https://repo.anaconda.com/pkgs/r
**/bin/bash**
**/usr/bin/bash**
no change     /usr/local/condabin/conda
no change     /usr/local/bin/conda
no change     /usr/local/bin/conda-env
no change     /usr/local/bin/activate
no change     /usr/local/bin/deactivate
no change     /usr/local/etc/profile.d/conda.sh
no change     /usr/local/etc/fish/conf.d/conda.fish
no change   

to activate the conda environment, need to source from conda's activate (which I installed in /usr/local/bin above), then activate the conda virtual environment.

this has to be done for each magic shell cell

In [4]:
%%bash
t0=$(date +%s%N)
. /usr/local/bin/activate
conda activate my_tfx_env
python --version

#consider conda install ipykernel
conda install pip

#see dependencies https://github.com/tensorflow/transform
pip -q install pyarrow==10.0.1
pip -q install apache-beam==2.59.0
pip -q install tensorflow==2.16.1
pip -q install tensorflow-transform==1.16.0
pip -q install tfx==1.16.0
pip -q install pytest
#
#tf metadata 1.16.1
#tfx-bsl
#keeps protobuf 3.20.3
#if use sparkrunner, install pyspark 4.0.0 or 3.3.x

pip list
t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 60000000000" | bc)
echo "$t2 minutes"
date
#about 6-7 minutes for this cell.

Python 3.10.18
2 channel Terms of Service accepted
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Package                            Version
---------------------------------- --------------
absl-py                            1.4.0
annotated-types                    0.7.0
anyio                              4.11.0
apache-beam                        2.59.0
argon2-cffi                        25.1.0
argon2-cffi-bindings               25.1.0
arrow                              1.3.0
astunparse                         1.6.3
async-lru                          2.0.5
async-timeout                      5.0.1
attrs                              23.2.0
babel                              2.17.0
backcall                           0.2.0
beautifulsoup4                     4.14.2
bleach                             6.2.0
cachetools                         5.5.2
certifi                       



    current version: 25.7.0
    latest version: 25.9.0

Please update conda by running

    $ conda update -n base -c defaults conda




In [5]:
%%bash
. /usr/local/bin/activate
conda activate my_tfx_env
python --version
pip show apache-beam

#refresh the test dirs
rm -rf /kaggle/working/bin/*

Python 3.10.18
Name: apache-beam
Version: 2.59.0
Summary: Apache Beam SDK for Python
Home-page: https://beam.apache.org
Author: Apache Software Foundation
Author-email: dev@beam.apache.org
License: Apache License, Version 2.0
Location: /usr/local/envs/my_tfx_env/lib/python3.10/site-packages
Requires: cloudpickle, crcmod, dill, fastavro, fasteners, grpcio, hdfs, httplib2, js2py, jsonpickle, jsonschema, numpy, objsize, orjson, packaging, proto-plus, protobuf, pyarrow, pyarrow-hotfix, pydot, pymongo, python-dateutil, pytz, redis, regex, requests, typing-extensions, zstandard
Required-by: tensorflow-data-validation, tensorflow-transform, tensorflow_model_analysis, tfx, tfx-bsl


The run_command is from
https://www.kaggle.com/code/taylorsamarel/change-python-version-kaggle-v2-taylor-amarel

In [6]:
import subprocess
def run_command(cmd, capture=True, check=False):
    cmds = f". /usr/local/bin/activate; conda activate my_tfx_env; {cmd}"
    try:
        result = subprocess.run(cmds, shell=True, capture_output=capture, text=True, check=check)
        if capture:
            return result.stdout.strip() if result.stdout else result.stderr.strip()
        return result.returncode == 0
    except Exception as e:
        return str(e)

In [7]:
print(run_command("python --version"))

Python 3.10.18


### 1.a. Download a TFX test script and test that the library versions are compatible

In [34]:
%%bash
. /usr/local/bin/activate
conda activate my_tfx_env

#it can take a couple of minutes to get current version of recently uploaded file to github
#wget -q -c --no-cache https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/main/src/test/python/test_tft.py -O /kaggle/working/test_tft.py
#curl --header "Cache-Control: no-cache" "https://api.github.com/repos/nking/recommender_systems/content/src/test/python/test_tft.py" -o /kaggle/working/test_tft.py

rm -f /kaggle/working/dataset_tfxio_example.py
wget -q -c --no-cache https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/main/src/test/python/dataset_tfxio_example.py -O /kaggle/working/dataset_tfxio_example.py

ls -l /kaggle/working

#run a test example from Google's TFX codebase:
python3 /kaggle/working/dataset_tfxio_example.py

date

total 112
drwxr-xr-x 6 root root  4096 Oct 10 01:17 bin
-rw-r--r-- 1 root root  7273 Oct 10 02:09 csv_example_gen_test.py
-rw-r--r-- 1 root root   777 Oct 10 02:09 CustomUTF8Coder.py
-rw-r--r-- 1 root root  2392 Oct 10 02:28 dataset_tfxio_example.py
-rw-r--r-- 1 root root  7899 Oct 10 02:09 ingest_movie_lens_beam.py
-rw-r--r-- 1 root root  5043 Oct 10 02:09 ingest_movie_lens_beam_test.py
-rw-r--r-- 1 root root  5796 Oct 10 02:09 ingest_movie_lens_component.py
-rw-r--r-- 1 root root  9813 Oct 10 02:09 ingest_movie_lens_component_test.py
-rw-r--r-- 1 root root 12556 Oct 10 02:09 ingest_movie_lens_custom_component.py
-rw-r--r-- 1 root root 11628 Oct 10 02:09 ingest_movie_lens_custom_component_test.py
drwxr-x--- 3 root root  4096 Oct 10 01:07 ml-1m
-rw-r--r-- 1 root root 11933 Oct 10 02:09 movie_lens_utils.py
-rw-r--r-- 1 root root  4377 Oct 10 02:09 movie_lens_utils_test.py
drwxr-xr-x 2 root root  4096 Oct 10 02:15 __pycache__
{'x_centered': [[-4.0], [-3.0], [-2.0], [-1.0], [0.0]],
 'x_sc

I1010 02:28:10.672368 138017414244160 pipeline.py:197] Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
I1010 02:28:12.561774 138017414244160 statecache.py:214] Creating state cache with size 104857600
I1010 02:28:12.734366 138017414244160 functional_saver.py:438] Sharding callback duration: 7
I1010 02:28:12.777997 138017414244160 functional_saver.py:438] Sharding callback duration: 8
INFO:tensorflow:Assets written to: /tmp/tmpxdy0vqmx/tftransform_tmp/67da4b526b5d4d3f86b05f8ac3165eb4/assets
I1010 02:28:12.810341 138017414244160 builder_impl.py:829] Assets written to: /tmp/tmpxdy0vqmx/tftransform_tmp/67da4b526b5d4d3f86b05f8ac3165eb4/assets
I1010 02:28:12.815073 138017414244160 fingerprinting_utils.py:49] Writing fingerprint to /tmp/tmpxdy0vqmx/tftransform_tmp/67da4b526b5d4d3f86b05f8ac3165eb4/fingerprint.pb
INFO:tensorflow:struct2tensor is not available.
I1010 02:28:13.202903 138017414244160 saved_transform_io.py:166] struct2tensor is not avail

## 2.a. Download a MovieLens dataset

In [35]:
%%bash
wget -q http://files.grouplens.org/datasets/movielens/ml-1m.zip -O /kaggle/working/ml-1m.zip
unzip -o /kaggle/working/ml-1m.zip
ls /kaggle/working/ml-1m/
rm /kaggle/working/ml-1m.zip

head -n 5 /kaggle/working/ml-1m/ratings.dat
head -n 5 /kaggle/working/ml-1m/users.dat
head -n 5 /kaggle/working/ml-1m/movies.dat

Archive:  /kaggle/working/ml-1m.zip
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         
movies.dat
ratings.dat
README
tmp
users.dat
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy


## 2.b. Write the TFX components, Beam PTransforms and unit tests
and upload them to a reachable repository.  
If the repository is private, you can use Kaggle secrets to hold API keys, etc for use in download below.

## 3. Download the components and transforms

### 3.a. Ingestion

The first component is the ingestion and it's written using custom apache beam PTransforms and custom TFX components.

Customization was needed to ingest the 3 files ("ratings.dat", "movies.dat", "users.dat"), left join them on ratings, and then split them.
The components are called IngestMovieLensComponent and ingest_movie_lens_component for the fully customized  and the python function customized versions, respectively.

I implemented the fully custom version and the python function component, but only one of them is needed.

In [46]:
%%bash
#it can take a couple of minutes to get current version of recently uploaded file to github
#wget -q -c --no-cache https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/main/src/test/python/test_tft.py -O /kaggle/working/test_tft.py
#curl --header "Cache-Control: no-cache" "https://api.github.com/repos/nking/recommender_systems/content/src/test/python/test_tft.py" -o /kaggle/working/test_tft.py

repo_uri='https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/development/src/main/python'
declare -a my_files=("ingest_movie_lens_beam.py" "CustomUTF8Coder.py" "ingest_movie_lens_component.py" "movie_lens_utils.py" "ingest_movie_lens_custom_component.py")
for item in "${my_files[@]}"
do
  rm -f "/kaggle/working/$item"
  echo "$item"
  wget -q -c --no-cache "$repo_uri/$item" -O /kaggle/working/$item
done

#repo_uri='https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/development/src/drafts/python'
#declare -a my_files=("ingest_movie_lens_custom_component.py")
#for item in "${my_files[@]}"
#do
#  rm -f "/kaggle/working/$item"
#  echo "$item"
#  wget -q -c --no-cache "$repo_uri/$item" -O /kaggle/working/$item
#done

repo_uri='https://raw.githubusercontent.com/nking/recommender_systems/refs/heads/development/src/test/python'
declare -a my_files=("ingest_movie_lens_beam_test.py" "ingest_movie_lens_component_test.py" "ingest_movie_lens_custom_component_test.py" "movie_lens_utils_test.py" "csv_example_gen_test.py")
for item in "${my_files[@]}"
do
  rm -f "/kaggle/working/$item"
  echo "$item"
  wget -q -c --no-cache "$repo_uri/$item" -O /kaggle/working/$item
done

ls -l /kaggle/working/
date

ingest_movie_lens_beam.py
CustomUTF8Coder.py
ingest_movie_lens_component.py
movie_lens_utils.py
ingest_movie_lens_custom_component.py
ingest_movie_lens_beam_test.py
ingest_movie_lens_component_test.py
ingest_movie_lens_custom_component_test.py
movie_lens_utils_test.py
csv_example_gen_test.py
total 112
drwxr-xr-x 6 root root  4096 Oct 10 01:17 bin
-rw-r--r-- 1 root root  7335 Oct 10 03:18 csv_example_gen_test.py
-rw-r--r-- 1 root root   777 Oct 10 03:18 CustomUTF8Coder.py
-rw-r--r-- 1 root root  2392 Oct 10 02:28 dataset_tfxio_example.py
-rw-r--r-- 1 root root  7899 Oct 10 03:18 ingest_movie_lens_beam.py
-rw-r--r-- 1 root root  5043 Oct 10 03:18 ingest_movie_lens_beam_test.py
-rw-r--r-- 1 root root  5796 Oct 10 03:18 ingest_movie_lens_component.py
-rw-r--r-- 1 root root  9877 Oct 10 03:18 ingest_movie_lens_component_test.py
-rw-r--r-- 1 root root 12556 Oct 10 03:18 ingest_movie_lens_custom_component.py
-rw-r--r-- 1 root root 11768 Oct 10 03:18 ingest_movie_lens_custom_component_test.py


### Run the unit tests

In [37]:
%%bash

. /usr/local/bin/activate
conda activate my_tfx_env

python --version

echo "run test for CSVExampleGen"

t0=$(date +%s%N)

python -m unittest /kaggle/working/csv_example_gen_test.py

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date

Python 3.10.18
run test for CSVExampleGen
TensorFlow version: 2.16.1
TFX version: 1.16.0
key=examples, value=OutputChannel(artifact_type=Examples, producer_component_id=CsvExampleGen, output_key=examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)
listing files in output_data_dir /kaggle/working/bin/csv_comp_1/testRun:
/kaggle/working/bin/csv_comp_1/testRun/test_csvgenexample/CsvExampleGen/examples/1/Split-test/data_tfrecord-00000-of-00001.gz
/kaggle/working/bin/csv_comp_1/testRun/test_csvgenexample/CsvExampleGen/examples/1/Split-train/data_tfrecord-00000-of-00001.gz
/kaggle/working/bin/csv_comp_1/testRun/test_csvgenexample/CsvExampleGen/examples/1/Split-eval/data_tfrecord-00000-of-00001.gz
14.148350710 seconds
Fri Oct 10 02:28:35 AM UTC 2025


INFO:absl:tensorflow_io is not available: No module named 'tensorflow_io'
INFO:absl:tensorflow_ranking is not available: No module named 'tensorflow_ranking'
INFO:absl:tensorflow_text is not available: No module named 'tensorflow_text'
INFO:absl:tensorflow_decision_forests is not available: No module named 'tensorflow_decision_forests'
INFO:absl:struct2tensor is not available: No module named 'struct2tensor'
INFO:absl:tensorflow_text is not available.
INFO:absl:tensorflow_recommenders is not available.
INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
DEBUG:absl:ConnectionConfig: sqlite {
}

DEBUG:absl:Processing input /kaggle/working/ml-1m/tmp/.
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
DEBUG:absl:Resolved input artifacts are: {}
DEBUG:absl:ID of run context test_csvgenexample is 1.
DEBUG:absl:Pipeline context [test_csvgenexample : 1]
DEBUG:absl:ID of run context test_csvgenexample.csv_co

In [38]:
%%bash
head -n 5 /kaggle/working/ml-1m/ratings.dat
head -n 5 /kaggle/working/ml-1m/users.dat
head -n 5 /kaggle/working/ml-1m/movies.dat

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy


In [39]:
!find /kaggle/working -type f


/kaggle/working/ingest_movie_lens_beam_test.py
/kaggle/working/ingest_movie_lens_component.py
/kaggle/working/dataset_tfxio_example.py
/kaggle/working/ml-1m/tmp/users2.dat
/kaggle/working/ml-1m/users.dat
/kaggle/working/ml-1m/movies.dat
/kaggle/working/ml-1m/ratings.dat
/kaggle/working/ml-1m/README
/kaggle/working/ingest_movie_lens_custom_component_test.py
/kaggle/working/ingest_movie_lens_custom_component.py
/kaggle/working/csv_example_gen_test.py
/kaggle/working/movie_lens_utils_test.py
/kaggle/working/movie_lens_utils.py
/kaggle/working/__pycache__/ingest_movie_lens_beam_test.cpython-310.pyc
/kaggle/working/__pycache__/movie_lens_utils_test.cpython-310.pyc
/kaggle/working/__pycache__/CustomUTF8Coder.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component_test.cpython-310.pyc
/kaggle/working/__pycache__/csv_examp

In [40]:
%%bash

. /usr/local/bin/activate
conda activate my_tfx_env

python --version

echo "run test for utils methods"

t0=$(date +%s%N)

python -m unittest /kaggle/working/movie_lens_utils_test.py

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date

Python 3.10.18
run test for utils methods
5.061283809 seconds
Fri Oct 10 02:28:41 AM UTC 2025


...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK


In [41]:
%%bash

. /usr/local/bin/activate
conda activate my_tfx_env

python --version

echo "run test for beam transforms"

t0=$(date +%s%N)

python -m unittest /kaggle/working/ingest_movie_lens_beam_test.py

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date

Python 3.10.18
run test for beam transforms
91.207950183 seconds
Fri Oct 10 02:30:14 AM UTC 2025


DEBUG:absl:columns=[('user_id', <class 'int'>), ('movie_id', <class 'int'>), ('rating', <class 'int'>), ('timestamp', <class 'int'>), ('gender', <class 'str'>), ('age', <class 'int'>), ('occupation', <class 'int'>), ('genres', <class 'str'>)]
.s
----------------------------------------------------------------------
Ran 2 tests in 84.739s

OK (skipped=1)


In [42]:
%%bash

. /usr/local/bin/activate
conda activate my_tfx_env

python --version

echo "run test for tfx python function custom component"

t0=$(date +%s%N)

python -m unittest /kaggle/working/ingest_movie_lens_component_test.py

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date


Python 3.10.18
run test for tfx python function custom component
TensorFlow version: 2.16.1
TFX version: 1.16.0
alt_output_data_dir=/tmp/runlvidiydx/tmpcofk59q1/test_ingest_movie_lens_component
key=output_examples
  value=OutputChannel(artifact_type=Examples, producer_component_id=ingest_movie_lens_component, output_key=output_examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)
listing files in PIPELINE_ROOT /kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFuncCustomCompPipeline:
/kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFuncCustomCompPipeline/tfx_metadata/metadata.db
/kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFuncCustomCompPipeline/ingest_movie_lens_component/output_examples/1/Split-test/data_tfrecord-00000-of-00001.tfrecord
/kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFuncCustomCompPipeline/

DEBUG:absl:test output_config=split_config {
  splits {
    name: "train"
    hash_buckets: 80
  }
  splits {
    name: "eval"
    hash_buckets: 10
  }
  splits {
    name: "test"
    hash_buckets: 10
  }
}

DEBUG:absl:TYPE of ratings_example_gen=<class 'ingest_movie_lens_component.ingest_movie_lens_component'>
INFO:absl:Using deployment config:
 executor_specs {
  key: "ingest_movie_lens_component"
  value {
    beam_executable_spec {
      python_executor_spec {
        class_path: "ingest_movie_lens_component.ingest_movie_lens_component_Executor"
      }
    }
  }
}
metadata_connection_config {
  database_connection_config {
    sqlite {
      filename_uri: "/kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFuncCustomCompPipeline/tfx_metadata/metadata.db"
      connection_mode: READWRITE_OPENCREATE
    }
  }
}

INFO:absl:Using connection config:
 sqlite {
  filename_uri: "/kaggle/working/bin/py_custom_comp_1/test_ingest_movie_lens_component/TestPythonFu

In [43]:
!find /kaggle/working -type f


/kaggle/working/ingest_movie_lens_beam_test.py
/kaggle/working/ingest_movie_lens_component.py
/kaggle/working/dataset_tfxio_example.py
/kaggle/working/ml-1m/tmp/users2.dat
/kaggle/working/ml-1m/users.dat
/kaggle/working/ml-1m/movies.dat
/kaggle/working/ml-1m/ratings.dat
/kaggle/working/ml-1m/README
/kaggle/working/ingest_movie_lens_custom_component_test.py
/kaggle/working/ingest_movie_lens_custom_component.py
/kaggle/working/csv_example_gen_test.py
/kaggle/working/movie_lens_utils_test.py
/kaggle/working/movie_lens_utils.py
/kaggle/working/__pycache__/ingest_movie_lens_beam_test.cpython-310.pyc
/kaggle/working/__pycache__/movie_lens_utils_test.cpython-310.pyc
/kaggle/working/__pycache__/CustomUTF8Coder.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component_test.cpython-310.pyc
/kaggle/working/__pycache__/csv_examp

In [47]:
%%bash

. /usr/local/bin/activate
conda activate my_tfx_env

python --version

echo "run test for TFX fully custom component"

t0=$(date +%s%N)

python -m unittest /kaggle/working/ingest_movie_lens_custom_component_test.py

t1=$(date +%s%N)
t2=$(echo "scale=9;($t1-$t0) / 1000000000" | bc)
echo $t2 seconds
date

Python 3.10.18
run test for TFX fully custom component
TensorFlow version: 2.16.1
TFX version: 1.16.0
alt_output_data_dir=/tmp/run669s4bzy/tmpvj8u5_nm/testRun2
key=output_examples, value=OutputChannel(artifact_type=Examples, producer_component_id=IngestMovieLensComponent, output_key=output_examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)
listing files in output_data_dir /kaggle/working/bin/fully_custom_comp_1/testRun2/TestFullyCustomCompPipeline:
/kaggle/working/bin/fully_custom_comp_1/testRun2/TestFullyCustomCompPipeline/tfx_metadata/metadata.db
/kaggle/working/bin/fully_custom_comp_1/testRun2/TestFullyCustomCompPipeline/IngestMovieLensComponent/output_examples/1/Split-test/data_tfrecord-00000-of-00001.gz
/kaggle/working/bin/fully_custom_comp_1/testRun2/TestFullyCustomCompPipeline/IngestMovieLensComponent/output_examples/1/Split-train/data_tfrecord-00000-of-00001.gz
/kaggle/working/bin/fully_custom_comp_1/testRun2/TestFullyCust

DEBUG:absl:test self.output_config=split_config {
  splits {
    name: "train"
    hash_buckets: 80
  }
  splits {
    name: "eval"
    hash_buckets: 10
  }
  splits {
    name: "test"
    hash_buckets: 10
  }
}

DEBUG:absl:in IngestMovieLensExecutor.Do
DEBUG:absl:in IngestMovieLensExecutor.GenerateExamplesByBeam
DEBUG:absl:about to read input and transform to tf.train.Example
DEBUG:absl:infiles_dict_ser=gASVCQIAAAAAAAB9lCiMB3JhdGluZ3OUfZQojARjb2xzlH2UKIwHdXNlcl9pZJR9lCiMBWluZGV4lEsAjAR0eXBllIwIYnVpbHRpbnOUjANpbnSUk5R1jAhtb3ZpZV9pZJR9lChoB0sBaAhoC3WMBnJhdGluZ5R9lChoB0sCaAhoC3WMCXRpbWVzdGFtcJR9lChoB0sDaAhoC3V1jAN1cmmUjCEva2FnZ2xlL3dvcmtpbmcvbWwtMW0vcmF0aW5ncy5kYXSUjA9oZWFkZXJzX3ByZXNlbnSUiYwFZGVsaW2UjAI6OpR1jAZtb3ZpZXOUfZQoaAN9lChoDH2UKGgHSwBoCGgLdYwFdGl0bGWUfZQoaAdLAWgIaAmMA3N0cpSTlHWMBmdlbnJlc5R9lChoB0sCaAhoHnV1aBKMIC9rYWdnbGUvd29ya2luZy9tbC0xbS9tb3ZpZXMuZGF0lGgUiWgVaBZ1jAV1c2Vyc5R9lChoA32UKGgFfZQoaAdLAGgIaAt1jAZnZW5kZXKUfZQoaAdLAWgIaB51jANhZ2WUfZQoaAdLAmgIaAt1jApvY2N1cGF0aW9ulH2UKGgH

In [45]:
!find /kaggle/working -type f

/kaggle/working/ingest_movie_lens_beam_test.py
/kaggle/working/ingest_movie_lens_component.py
/kaggle/working/dataset_tfxio_example.py
/kaggle/working/ml-1m/tmp/users2.dat
/kaggle/working/ml-1m/users.dat
/kaggle/working/ml-1m/movies.dat
/kaggle/working/ml-1m/ratings.dat
/kaggle/working/ml-1m/README
/kaggle/working/ingest_movie_lens_custom_component_test.py
/kaggle/working/ingest_movie_lens_custom_component.py
/kaggle/working/csv_example_gen_test.py
/kaggle/working/movie_lens_utils_test.py
/kaggle/working/movie_lens_utils.py
/kaggle/working/__pycache__/ingest_movie_lens_beam_test.cpython-310.pyc
/kaggle/working/__pycache__/movie_lens_utils_test.cpython-310.pyc
/kaggle/working/__pycache__/CustomUTF8Coder.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component.cpython-310.pyc
/kaggle/working/__pycache__/ingest_movie_lens_custom_component_test.cpython-310.pyc
/kaggle/working/__pycache__/csv_examp