<a name="top"></a><a id="top">
# Tests on tf.train.Feature data types
   
<a href="https://colab.research.google.com/github/gbih/ml-notes/blob/main/tf_record_tftrain/nb_002_tftrainFeature_non_scalars.ipynb">
<strong>View in Colab</strong>
</a>

1. [Setup](#setup)
2. [Introduction](#2.0)
3. [tf.train.BytesList](#3.0)
    * 3.1 [byte](#3.1)
        - 3.1.1 [image as byte](#3.1.1)
    * 3.2 [string](#3.2)
    * 3.3 [image](#3.3)
4. [tf.train.FloatList](#4.0)
    * 4.1 [float32](#4.1)
    * 4.2 [float64](#4.2)
5. [tf.train.Int64List](#5.0)
    * 5.1 [bool](#5.1)
    * 5.2 [enum](#5.2)
    * 5.3 [int32](#5.3)
    * 5.4 [uint32](#5.4)
    * 5.5 [int64](#5.5)
    * 5.6 [uint64](#5.6)

---
<a id="setup"></a><a name="setup"></a>
# 1. Setup
<a href="#top">[back to top]</a>

In [1]:
import glob
#import IPython.display as display
# import matplotlib.pyplot as plt
import numpy as np
import os
import pprint as pp
import tensorflow as tf

# To make this notebook's output stable across runs
tf.random.set_seed(42)
np.random.seed(42)

def HR():
    print("-"*40)
    
print("Libraries loaded..")

Libraries loaded..


---
<a id="2.0"></a><a name="2.0"></a>
# 2. Introduction
<a href="#top">[back to top]</a>

According to the [official documentation](https://www.tensorflow.org/tutorials/load_data/tfrecord), the `tf.train.Feature` message type can accept one of the following three types. 

The `tf.train.Feature` message type can accept one of the following three types. Most other generic types can be coerced into one of these:

1. `tf.train.BytesList` (the following types can be coerced)
    - `byte`
    - `string`
2. `tf.train.FloatList` (the following types can be coerced)
    - `float` (`float32`)
    - `double` (`float64`)
3. `tf.train.Int64List` (the following types can be coerced)
    - `bool`
    - `enum`
    - `int32`
    - `uint32`
    - `int64`
    - `uint64`

**Note**: Each function here takes a scalar input value and returns a `tf.train.Feature` containing one of the three list types.



---
<a id='3.0'></a><a name="3.0"></a>
# 3. tf.train.BytesList
<a href="#top">[back to top]</a>

* 3.1 [byte](#3.1)
* 3.2 [string](#3.2)
* 3.3 [image](#3.3)

Used in tf.train.Example protos. Holds a list of byte-strings.

**Note**: If we pass a EagerTensor, tf.train.BytesList will not unpack a string, resulting in this error:
   
```python
TypeError: <tf.Tensor: shape=(), dtype=int32, numpy=0> has type tensorflow.python.framework.ops.EagerTensor, but expected one of: byte
```

In [2]:
# scalar bytes
def bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    # tf.constant(0) is arbitrary, just use it to test for eager_tensor type
    eager_tensor_type = type(tf.constant(0))
    
    if isinstance(value, eager_tensor_type):
        value = value.numpy()
    return tf.train.Feature(
        bytes_list=tf.train.BytesList(value=[value])
    )


# variation w/o unpacking value from EagerTensor
def bytes_feature_no_conversion(value):
    """Returns a bytes_list from a string / byte."""
    try:
        return tf.train.Feature(
            bytes_list=tf.train.BytesList(value=[value])
        )
    except Exception as e:
        print(f"Error: {e}")

<a id='3.1'></a><a name="3.1"></a>
## 3.1 byte
<a href="#top">[back to top]</a>

In [36]:
bytes_byte1 = tf.Variable("test_string as tf.Variable").value()
assert isinstance(bytes_byte1, type(tf.constant(0))) # EagerTensor

bytes_byte2 = tf.constant("test_string as tf.constant")
assert isinstance(bytes_byte2, type(tf.constant(0))) # EagerTensor

bytes_byte3 = b'this is sentence of byte-type'
assert isinstance(bytes_byte3, bytes) # EagerTensor

print(bytes_feature(bytes_byte1))
HR()

print(bytes_feature(bytes_byte2))
HR()

print(bytes_feature(bytes_byte3))

bytes_list {
  value: "test_string as tf.Variable"
}

----------------------------------------
bytes_list {
  value: "test_string as tf.constant"
}

----------------------------------------
bytes_list {
  value: "this is sentence of byte-type"
}



<a id='3.3.1'></a><a name="3.3.1"></a>
### 3.3.1 image
<a href="#top">[back to top]</a>

We can cast images as bytes.

Use [`tf.io.encode_jpeg`](https://www.tensorflow.org/api_docs/python/tf/io/encode_jpeg) to JPEG-encode the data, so  it can be passed to `tf.train.BytesList`. The input is a 3-D uint8 Tensor of shape [height, width, channels].

```python
tf.io.encode_jpeg(
    image,
    format='',
    quality=95,
    progressive=False,
    optimize_size=False,
    chroma_downsampling=True,
    density_unit='in',
    x_density=300,
    y_density=300,
    xmp_metadata='',
    name=None
)
```

In [4]:
def image_feature(value):
    """Returns a bytes_list from a string / byte"""
    # tf.constant(0) is arbitrary, just use it to test for eager_tensor type
    eager_tensor_type = type(tf.constant(0))
    if isinstance(value, eager_tensor_type):
        value = value.numpy() 
    return tf.train.Feature(
        bytes_list = tf.train.BytesList(
            value = [tf.io.encode_jpeg(value).numpy()]
        )
    )

R = np.zeros([128 * 128])
G = np.ones([128 * 128]) * 100
B = np.ones([128 * 128]) * 200

# w/o reshape, the shape is (16384, 3)
data = np.array(list(zip(R, G, B)), dtype=np.uint8)
print("Before reshaping image:")
print(data.shape)
HR()

# after reshape
data = data.reshape(128, 128, 3)
print("After reshaping image:")
print(data.shape)
HR()

# sample of data 
print("Sample of data:")
print(data[0][:5][:])
HR()

bytes_image1 = image_feature(data)
print(bytes_image1)

Before reshaping image:
(16384, 3)
----------------------------------------
After reshaping image:
(128, 128, 3)
----------------------------------------
Sample of data:
[[  0 100 200]
 [  0 100 200]
 [  0 100 200]
 [  0 100 200]
 [  0 100 200]]
----------------------------------------
bytes_list {
  value: "\377\330\377\340\000\020JFIF\000\001\001\001\001,\001,\000\000\377\333\000C\000\002\001\001\001\001\001\002\001\001\001\002\002\002\002\002\004\003\002\002\002\002\005\004\004\003\004\006\005\006\006\006\005\006\006\006\007\t\010\006\007\t\007\006\006\010\013\010\t\n\n\n\n\n\006\010\013\014\013\n\014\t\n\n\n\377\333\000C\001\002\002\002\002\002\002\005\003\003\005\n\007\006\007\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\377\300\000\021\010\000\200\000\200\003\001\"\000\002\021\001\003\021\001\377\304\000\037\000\000\001\005\001\001\001\001\001\001\000\000\000\000\000\000\000\000\001\002\003\004\005\006\007\010\t\n\013\377\304

<a id='3.2'></a><a name="3.2"></a>
## 3.2 string
<a href="#top">[back to top]</a>

We cannot use strings directly, we have to first cast to type `bytes`

In [82]:
bytes_str1 = "test_string as string"
assert isinstance(bytes_str1, str) 

# Error when using string type w/o casting to bytes
try:
    print(bytes_feature(bytes_str1))
except Exception as e:
    print(f"Error: {e}")
HR()

# w/o encode, u'..' is a string; with encode, this becomes bytes
bytes_str2 = u'test_bytes'.encode('utf-8')
assert isinstance(bytes_str2, bytes)

bytes_str3 = b'test_string'
assert isinstance(bytes_str3, bytes)

bytes_str4 = "test_string".encode()
assert isinstance(bytes_str4, bytes)

Error: 'test_string as string' has type str, but expected one of: bytes
----------------------------------------


In [83]:
# Test of function that does not handle eager-functions

print(bytes_feature_no_conversion(bytes_str1))
HR()

print(bytes_feature_no_conversion(bytes_str2))
HR()

print(bytes_feature_no_conversion(bytes_str3))
HR()

print(bytes_feature_no_conversion(bytes_str4)) 

Error: 'test_string as string' has type str, but expected one of: bytes
None
----------------------------------------
bytes_list {
  value: "test_bytes"
}

----------------------------------------
bytes_list {
  value: "test_string"
}

----------------------------------------
bytes_list {
  value: "test_string"
}



---
<a id='4.0'></a><a name="4.0"></a>
# 4. [tf.train.FloatList](https://www.tensorflow.org/api_docs/python/tf/train/FloatList)
<a href="#top">[back to top]</a>

Used in tf.train.Example protos. Holds a list of floats.

* 4.1 [float32](#4.1)
* 4.2 [float64](#4.2)

In [84]:
def float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(
        float_list=tf.train.FloatList(value=[value])
    )

<a id='4.1'></a><a name="4.1"></a>
## 4.1 float32
<a href="#top">[back to top]</a>

32-bit floating-point number

In [79]:
float32_min = np.float32(np.finfo(np.float32).min)
float32_max = np.float32(np.finfo(np.float32).max)
print(f"{float32_min:.2f}")
print(f"{float32_max:.2f}")

-340282346638528859811704183484516925440.00
340282346638528859811704183484516925440.00


In [85]:
float32_1 = np.exp(1, dtype=np.float32)
print(float32_1)
print(type(float32_1))
assert isinstance(float32_1, np.float32) 
assert type(float32_1) == np.float32
HR()

float32_2 = np.float32(33.9)
print(float32_2)
print(type(float32_2))
assert isinstance(float32_2, np.float32) 
assert type(float32_2) == np.float32
HR()

# np.float32 max
float32_3 = np.float32(np.finfo(np.float32).max)
print(float32_3 )
print(type(float32_3))
assert isinstance(float32_3, np.float32) 
assert type(float32_3) == np.float32
HR()

print(float_feature(float32_1))
HR()

print(float_feature(float32_2))
HR()

print(float_feature(float32_3))
HR()

2.718282
<class 'numpy.float32'>
----------------------------------------
33.9
<class 'numpy.float32'>
----------------------------------------
3.4028235e+38
<class 'numpy.float32'>
----------------------------------------
float_list {
  value: 2.7182819843292236
}

----------------------------------------
float_list {
  value: 33.900001525878906
}

----------------------------------------
float_list {
  value: 3.4028234663852886e+38
}

----------------------------------------


<a id='4.2'></a><a name="4.2"></a>
## 4.2 float64
<a href="#top">[back to top]</a>

64-bit floating-point number

In [81]:
float64_min = np.float64(np.finfo(np.float64).min)
float64_max = np.float64(np.finfo(np.float64).max)
print(f"{float64_min:.2f}")
HR()
print(f"{float64_max:.2f}")

-179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00
----------------------------------------
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00


In [86]:
float64_1 = np.exp(1, dtype=np.float64)
print(float64_1)
print(type(float64_1))
assert isinstance(float64_1, np.float64) 
assert type(float64_1) == np.float64
HR()

float64_2 = np.float64(33.9)
print(float64_2)
print(type(float64_2))
assert isinstance(float64_2, np.float64) 
assert type(float64_2) == np.float64
HR()

# np.float64 max
float64_3 = np.float64(np.finfo(np.float64).max)
print(float64_3 )
print(type(float64_3))
assert isinstance(float64_3, np.float64) 
assert type(float64_3) == np.float64
HR()


print(float_feature(float64_1))
HR()

print(float_feature(float64_2))
HR()

print(float_feature(float64_3)) 
# TODO: check this result
# float_list {
#   value: inf
# }

2.718281828459045
<class 'numpy.float64'>
----------------------------------------
33.9
<class 'numpy.float64'>
----------------------------------------
1.7976931348623157e+308
<class 'numpy.float64'>
----------------------------------------
float_list {
  value: 2.7182817459106445
}

----------------------------------------
float_list {
  value: 33.900001525878906
}

----------------------------------------
float_list {
  value: inf
}



---
<a id='5.0'></a><a name="5.0"></a>
# 5. [tf.train.Int64List](https://www.tensorflow.org/api_docs/python/tf/train/Int64List)
<a href="#top">[back to top]</a>

Used in tf.train.Example protos. Holds a list of Int64s.

* 5.1 [bool](#5.1)
* 5.2 [enum](#5.2)
* 5.3 [int32](#5.3)
* 5.4 [uint32](#5.4)
* 5.5 [int64](#5.5)
* 5.6 [uint64](#5.6)


**Note:**
We wrap value with int, as in `value=[int(value)]`, to avoid this error when we pass in booleans:

```python
DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
```

In [10]:
def int64_feature(value):
    """Returns an int64_list from a bool / enum / int / unit"""
    return tf.train.Feature(int64_list=tf.train.Int64List(
        value=[int(value)])
    )

<a id='5.1'></a><a name="5.1"></a>
## 5.1 bool
<a href="#top">[back to top]</a>

In [87]:
bool_1 = np.ones(1, dtype=bool)[0]
print(type(bool_1))
assert type(bool_1) == np.bool_
assert isinstance(bool_1, np.bool_) 
HR()

bool_2 = bool("Hello")
print(type(bool_2))
assert type(bool_2) == bool
assert isinstance(bool_2, bool)
HR()

bool_3 = False
print(type(bool_3))
assert type(bool_3) == bool
assert isinstance(bool_3, bool)
HR()

#####

print(int64_feature(bool_1))
HR()

print(int64_feature(bool_2))
HR()

print(int64_feature(bool_3))

<class 'numpy.bool_'>
----------------------------------------
<class 'bool'>
----------------------------------------
<class 'bool'>
----------------------------------------
int64_list {
  value: 1
}

----------------------------------------
int64_list {
  value: 1
}

----------------------------------------
int64_list {
  value: 0
}



<a id='5.2'></a><a name="5.2"></a>
## 5.2 enum
<a href="#top">[back to top]</a>

In [12]:
# Different Enum types: EnumMeta, Enum, IntEnum, Flag, IntFlag, auto, unique
import enum

# Need to pass an enum which is a subclass of int, so it will be compatible with tf.train.Int64List
class Color(enum.IntEnum):
    RED = 1
    GREEN = 2
    BLUE = 3

print(f"type of Color.RED: {type(Color.RED)}")

assert str(type(Color.RED)) == "<enum 'Color'>"

assert isinstance(Color.RED, enum.IntEnum)
HR()

print(int64_feature(Color.RED))

type of Color.RED: <enum 'Color'>
----------------------------------------
int64_list {
  value: 1
}



In [13]:
# Example of improper Enum type
import enum

# Need to pass an enum which is a subclass of int, so it will be compatible with tf.train.Int64List
class Color2(enum.Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

print(f"type of Color2.RED: {type(Color2.RED)}")

assert str(type(Color2.RED)) == "<enum 'Color2'>"

assert isinstance(Color2.RED, enum.Enum)
HR()

try:
    print(int64_feature(Color2.RED))
except Exception as e:
    print(f"{type(e).__name__} : {e}")

type of Color2.RED: <enum 'Color2'>
----------------------------------------
TypeError : int() argument must be a string, a bytes-like object or a number, not 'Color2'


<a id='5.3'></a><a name="5.3"></a>
## 5.3 int32
<a href="#top">[back to top]</a>

int32: Signed integer

In [14]:
int32_min = np.int32(np.iinfo(np.int32).min)
int32_max = np.int32(np.iinfo(np.int32).max)
print(f"{int32_min:,}")
print(f"{int32_max:,}")

-2,147,483,648
2,147,483,647


In [88]:
int32_1 = np.power(2, 3, dtype=np.int32)
print(int32_1)
print(type(int32_1))
assert str(type(int32_1)) == "<class 'numpy.int32'>"
assert type(int32_1) == np.int32
assert isinstance(int32_1, np.int32) 
HR()

int32_2 = np.int32(100)
print(int32_2)
print(type(int32_2))
assert str(type(int32_2)) == "<class 'numpy.int32'>"
assert type(int32_2) == np.int32
assert isinstance(int32_2, np.int32)
HR()

# np.int32 max
int32_3 = np.int32(np.iinfo(np.int32).max)
print(int32_3 )
print(type(int32_3))
assert isinstance(int32_3, np.int32) 
assert type(int32_3) == np.int32
HR()

#####

print(int64_feature(int32_1))
HR()

print(int64_feature(int32_2))
HR()

print(int64_feature(int32_3))
HR()

8
<class 'numpy.int32'>
----------------------------------------
100
<class 'numpy.int32'>
----------------------------------------
2147483647
<class 'numpy.int32'>
----------------------------------------
int64_list {
  value: 8
}

----------------------------------------
int64_list {
  value: 100
}

----------------------------------------
int64_list {
  value: 2147483647
}

----------------------------------------


<a id='5.4'></a><a name="5.4"></a>
## 5.4 uint32
<a href="#top">[back to top]</a>

uint32: Unsigned integer

In [16]:
uint32_min = np.uint32(np.iinfo(np.uint32).min)
uint32_max = np.uint32(np.iinfo(np.uint32).max)
print(f"{uint32_min:,}")
print(f"{uint32_max:,}")

0
4,294,967,295


In [89]:
uint32_1 = np.power(2, 3, dtype=np.uint32)
print(uint32_1)
print(type(uint32_1))
assert type(uint32_1) == np.uint32
assert str(type(uint32_1)) == "<class 'numpy.uint32'>"
assert isinstance(uint32_1, np.uint32) 

HR()

uint32_2 = np.uint32(100)
print(uint32_2)
print(type(uint32_2))
assert type(uint32_2) == np.uint32
assert str(type(uint32_2)) == "<class 'numpy.uint32'>"
assert isinstance(uint32_2, np.uint32)
HR()

# max for Python int32
uint32_3 = np.uint32(np.iinfo(np.uint32).max)
print(uint32_3)
print(type(uint32_3))
assert type(uint32_3) == np.uint32
assert str(type(uint32_3)) == "<class 'numpy.uint32'>"
HR()

#####

print(int64_feature(uint32_1))
HR()

print(int64_feature(uint32_2))
HR()

print(int64_feature(uint32_3))

8
<class 'numpy.uint32'>
----------------------------------------
100
<class 'numpy.uint32'>
----------------------------------------
4294967295
<class 'numpy.uint32'>
----------------------------------------
int64_list {
  value: 8
}

----------------------------------------
int64_list {
  value: 100
}

----------------------------------------
int64_list {
  value: 4294967295
}



<a id='5.5'></a><a name="5.5"></a>
## 5.5 int64
<a href="#top">[back to top]</a>

int64: Signed integer

In [18]:
int64_min = np.int64(np.iinfo(np.int64).min)
int64_max = np.int64(np.iinfo(np.int64).max)
print(f"{int64_min:,}")
print(f"{int64_max:,}")

-9,223,372,036,854,775,808
9,223,372,036,854,775,807


In [90]:
int64_1 = np.power(2, 3, dtype=np.int64)
print(int64_1)
print(type(int64_1))
assert type(int64_1) == np.int64
assert str(type(int64_1)) == "<class 'numpy.int64'>"
assert isinstance(int64_1, np.int64) 
HR()

int64_2 = np.int64(100)
print(int64_2)
print(type(int64_2))
assert type(int64_2) == np.int64
assert str(type(int64_2)) == "<class 'numpy.int64'>"
assert isinstance(int64_2, np.int64)
HR()

# max for Python int64
int64_3 = np.int64(np.iinfo(np.int64).max)
print(int64_3)
print(type(int64_3))
assert type(int64_3) == np.int64
assert str(type(int64_3)) == "<class 'numpy.int64'>"
HR()

#####

print(int64_feature(int64_1))
HR()

print(int64_feature(int64_2))
HR()

print(int64_feature(int64_3))
HR()

8
<class 'numpy.int64'>
----------------------------------------
100
<class 'numpy.int64'>
----------------------------------------
9223372036854775807
<class 'numpy.int64'>
----------------------------------------
int64_list {
  value: 8
}

----------------------------------------
int64_list {
  value: 100
}

----------------------------------------
int64_list {
  value: 9223372036854775807
}

----------------------------------------


<a id='5.6'></a><a name="5.6"></a>
## 5.6 uint64
<a href="#top">[back to top]</a>

uint64: Unsigned integer

**Note:**

1. When we try to use `tf.train.Int64List` with a value exceeding `np.Int64` max (eg 9,223,372,036,854,775,807 + 1), we get this error:

```python
OverflowError
Python int too large to convert to C long
```

2. When we try to use `tf.train.Int64List` with the max value of `UInt64`, we get this error:

```python
ValueError
Value out of range: 18446744073709551615
```

In [20]:
uint64_min = np.uint64(np.iinfo(np.uint64).min)
uint64_max = np.uint64(np.iinfo(np.uint64).max)
print(f"{uint64_min:,}")
print(f"{uint64_max:,}")

0
18,446,744,073,709,551,615


In [97]:
# uint64 max
uint64_1 = np.uint64(np.iinfo(np.uint64).max)
print(uint64_1)
print(type(uint64_1))
assert type(uint64_1) == np.uint64
assert str(type(uint64_1)) == "<class 'numpy.uint64'>"
HR()

max_int64 = np.int64(np.iinfo(np.int64).max)
print(f"Test using max np.int64:\n{max_int64:,}\n")
try:
    print(int64_feature(np.int64(np.iinfo(np.int64).max)))
except Exception as e:
    print(type(e).__name__)
    print(e)
    
HR()

max_int64_plus_one = np.uint64(np.iinfo(np.int64).max+1)
print(f"Test using max np.int64+1:\n{max_int64_plus_one:,}\n")
try:
    print(int64_feature(np.int64(np.iinfo(np.int64).max+1)))
except Exception as e:
    print(type(e).__name__)
    print(e)
    
HR()

print(f"Test using max np.uint64:\n{uint64_1:,}\n")
try:
    print(int64_feature(uint64_1))
except Exception as e:
    print(type(e).__name__)
    print(e)

18446744073709551615
<class 'numpy.uint64'>
----------------------------------------
Test using max np.int64:
9,223,372,036,854,775,807

int64_list {
  value: 9223372036854775807
}

----------------------------------------
Test using max np.int64+1:
9,223,372,036,854,775,808

OverflowError
Python int too large to convert to C long
----------------------------------------
Test using max np.uint64:
18,446,744,073,709,551,615

ValueError
Value out of range: 18446744073709551615


In [23]:
test_bytes_byte_list = [b'test_string1', b'test_string2']

print(type(test_bytes_byte_list))
HR()

# type of outer container (list)
assert isinstance(test_bytes_byte_list, list)

# type of inner elements (bytes)
assert set(map(type, test_bytes_byte_list)) == {bytes}

#####

try:
    # Try using utility function bytes_feature()
    print(bytes_feature(test_bytes_byte_list))
except Exception as e:
    print(type(e).__name__)
    print(e)

# Get this error:
# TypeError
# [b'test_string1', b'test_string2'] has type list, but expected one of: bytes

# Need to treat this as a non-scalar feature (eg feature list)

<class 'list'>
----------------------------------------
TypeError
[b'test_string1', b'test_string2'] has type list, but expected one of: bytes


---
A simple way to handle non-scalar features is to use `tf.io.serialize_tensor` to convert tensors to binary strings. Then, use the utility wraps `tf.train.BytesList`, which here is `bytes_feature`.

Use `tf.io.parse_tensor` to convert the binary-string back to a tensor.

In [24]:
# Detect the element type inside a list
t1 = [tf.constant("tf.constant string 1"), tf.constant("tf.constant string 2")]

print(type(t1))
print(map(type, t1))
print(set(map(type, t1))) 
print(set(map(type, t1)) == {type(tf.constant(0))})
HR()

g = [1,2,4]
print(g)
print(map(type, g))
print(set(map(type, g)))
print(set(map(type, g)) == {int})
HR()

i = set((1, "test", True))
print(i)
print(set(map(type, i)))
print(set(map(type, i))=={int, str})
HR()


# Converts the ndarray to a Tensor.
# TODO - how to detect type of items in +2D deeper lists
numpy_original = np.array([["one", "two"], ["three", "four"]])
numpy_original

<class 'list'>
<map object at 0x124f61070>
{<class 'tensorflow.python.framework.ops.EagerTensor'>}
True
----------------------------------------
[1, 2, 4]
<map object at 0x124f61070>
{<class 'int'>}
True
----------------------------------------
{'test', 1}
{<class 'int'>, <class 'str'>}
True
----------------------------------------


array([['one', 'two'],
       ['three', 'four']], dtype='<U5')

In [25]:
def bytes_feature_list(value):
    """Returns a bytes_list from a string / byte."""

    # To check whether all elements in a list are integers
    set(map(type, [1,2,3])) == {int}

    # Check whether all elements in the list are 
    print("type(value): ", type(value))
    print("map(type, value):", map(type, value))
    print("set(map(type, value)):", set(map(type, value)))
    print()
    
    if set(map(type, value)) == {type(tf.constant(0))}:
        print(f"** PASSING AN EAGER-TENSOR value:\n{value}\n")
        value = list(map(lambda t: t.numpy(), value))
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
   
    
test_bytes_list_1 = [b'test_string', b'test_string2']
assert isinstance(test_bytes_list_1, list)
assert set(map(type, test_bytes_list_1)) == {bytes}
print(bytes_feature_list(test_bytes_list_1))
HR()


test_bytes_list_2 = [tf.constant("tf.constant string 1"), tf.constant("tf.constant string 2")]
assert isinstance(test_bytes_list_2, list)
assert set(map(type, test_bytes_list_2)) == {type(tf.constant(0))}
print(bytes_feature_list(test_bytes_list_2))




# Converts the given value to a Tensor.
numpy_original = np.array([["one", "two"], ["three", "four"]], dtype=str)
tensor_from_numpy = tf.convert_to_tensor(
    numpy_original,
    dtype=tf.string
)
print("tensor_from_numpy")
print(type(tensor_from_numpy))
HR()
print(isinstance(tensor_from_numpy, tf.Tensor))
assert set(map(type, tensor_from_numpy)) == {type(tf.constant(0))}

type(value):  <class 'list'>
map(type, value): <map object at 0x12ddbbfa0>
set(map(type, value)): {<class 'bytes'>}

bytes_list {
  value: "test_string"
  value: "test_string2"
}

----------------------------------------
type(value):  <class 'list'>
map(type, value): <map object at 0x12ddbbdf0>
set(map(type, value)): {<class 'tensorflow.python.framework.ops.EagerTensor'>}

** PASSING AN EAGER-TENSOR value:
[<tf.Tensor: shape=(), dtype=string, numpy=b'tf.constant string 1'>, <tf.Tensor: shape=(), dtype=string, numpy=b'tf.constant string 2'>]

bytes_list {
  value: "tf.constant string 1"
  value: "tf.constant string 2"
}

tensor_from_numpy
<class 'tensorflow.python.framework.ops.EagerTensor'>
----------------------------------------
True


In [26]:
# https://stackoverflow.com/questions/62348605/how-to-parse-tensor-without-giving-out-type-in-tensorflow

# A scalar tf.string tensor containing the serialized input_tensor
serialized_tensor = tf.io.serialize_tensor(test_bytes_byte_list)

print(f"serialized_tensor:\n{serialized_tensor}")
HR()

print(f"type of serialized_tensor:\n{type(serialized_tensor)}")
HR()


# Representing a tensors as protos. 
# Create a TensorProto from serialized_tensor content
# Inputs: python scalar, python list, numpy ndarray, numpy scalar
# https://www.tensorflow.org/api_docs/python/tf/make_tensor_proto
tensor_proto = tf.make_tensor_proto(test_bytes_byte_list)
print(f"tensor_proto:\n{tensor_proto}\n")
print(f"type of tensor_proto:\n{type(tensor_proto)}\n")
HR()


print("Parse serialized protocol buffer data into this message., via tensor_proto.ParseFromString()\n")
# https://github.com/protocolbuffers/protobuf/blob/main/python/google/protobuf/message.py
tensor_proto.ParseFromString(serialized_tensor.numpy())
HR()


print("IsInitialized:")
print(tensor_proto.IsInitialized())
HR()


print("ListFields:")
pp.pprint(tensor_proto.ListFields())
HR()


print("dtype of tensor:")
print(tf.dtypes.as_dtype(tensor_proto.dtype))
HR()


# Read data back from tensor proto
tensor_parsed = tf.io.parse_tensor(
    serialized_tensor.numpy(),
    tf.dtypes.as_dtype(tensor_proto.dtype)
)

print("test_bytes_byte_list:")
print(test_bytes_byte_list)
HR()

print("tensor_parsed:")
print(tensor_parsed)
HR()

tf.debugging.assert_equal(
    test_bytes_byte_list, tensor_parsed, message="Assert failed.")


try:
    result = print(bytes_feature(test_bytes_byte_list_wrapper))
except Exception as e:
    print(type(e).__name__)
    print(e)
else:
    print("*** Returned result from bytes_feature():")
    print(type(result))
    print(result)
    
    

serialized_tensor:
b'\x08\x07\x12\x04\x12\x02\x08\x02B\x0ctest_string1B\x0ctest_string2'
----------------------------------------
type of serialized_tensor:
<class 'tensorflow.python.framework.ops.EagerTensor'>
----------------------------------------
tensor_proto:
dtype: DT_STRING
tensor_shape {
  dim {
    size: 2
  }
}
string_val: "test_string1"
string_val: "test_string2"


type of tensor_proto:
<class 'tensorflow.core.framework.tensor_pb2.TensorProto'>

----------------------------------------
Parse serialized protocol buffer data into this message., via tensor_proto.ParseFromString()

----------------------------------------
IsInitialized:
True
----------------------------------------
ListFields:
[(<google.protobuf.pyext._message.FieldDescriptor object at 0x12dff5400>, 7),
 (<google.protobuf.pyext._message.FieldDescriptor object at 0x1115c3370>,
  dim {
  size: 2
}
),
 (<google.protobuf.pyext._message.FieldDescriptor object at 0x12ddbbac0>,
  [b'test_string1', b'test_string2'])]
-

In [27]:
# benchmark tool for single line, runs multiple times default:7) for average
%timeit [x ** 2 for x in range(100000)] # 計測したい処理

39.7 ms ± 2.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [28]:
%timeit assert set(map(type, tensor_parsed)) == {type(tf.constant(0))}

406 µs ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
