Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the swap of data and feature dimension to work in the general case. #214

Merged
merged 1 commit into from
Apr 12, 2023

Conversation

kshiteejm
Copy link
Collaborator

Previous implementation was broken as using transpose assumes that data_list is a 2D array.

However, in certain cases (when all the feature values array lengths are the same) the data_list can be a 3D array as the call to data_list = np.array(list(dataset.as_numpy_iterator()), dtype=object) merges inner np arrays and converts data_list into one big 3D array.

…ase.

Previous implementation was broken as using transpose assumes that `data_list` is a 2D array.

However, in certain cases (when all the feature values array lengths are the same) the `data_list` can be a 3D array as the call to
`data_list = np.array(list(dataset.as_numpy_iterator()), dtype=object)` 
merges inner np arrays and converts `data_list` into one big 3D array.
@kshiteejm kshiteejm requested a review from mtrofin April 12, 2023 02:05
@google-cla
Copy link

google-cla bot commented Apr 12, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@mtrofin
Copy link
Collaborator

mtrofin commented Apr 12, 2023

Thanks, @kshiteejm!

To capture offline discussion and @kshiteejm's offline example:

it's how numpy behaves:

>>> np.array(list([[np.ones(2)], [np.ones(2)], [np.ones(2)]]), dtype=object)
array([[[1.0, 1.0]],

       [[1.0, 1.0]],

       [[1.0, 1.0]]], dtype=object)

>>> np.array(list([[np.ones(2)], [np.ones(2)], [np.ones(3)]]), dtype=object)
array([[array([1., 1.])],
       [array([1., 1.])],
       [array([1., 1., 1.])]], dtype=object)

In our case, the data is shaped as:

  • 1st dimension is traces (i.e. 1 per module)
  • 2nd dimension is features
  • 3rd dimension is feature tensor values

The general case is that the feature values have different shapes, in which case the data would be 2D with object values (the objects being various sized arrays). But if all features have exactly the same length, the result appears as a 3D value.

Thanks, @kshiteejm, for this clarification!

@mtrofin mtrofin merged commit 0f24e63 into main Apr 12, 2023
@mtrofin mtrofin deleted the kshiteejm-patch-1 branch April 12, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants