Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-step conversion + documentation changes + bug fixes + small code changes #206

Merged
merged 14 commits into from
Oct 27, 2022

Conversation

AmroAlJundi
Copy link
Contributor

@AmroAlJundi AmroAlJundi commented Oct 17, 2022

Added multi-step conversion to the library. More specific changes are:

  • Converter no longer finds a conversion function, but a conversion chain of function-context pairs.
  • Added a cached version of Convert that returns the intermediate formats when a multi-step conversion occurs.
  • Convert functions now can take a vector of contexts rather than a single destination context. Note: a version that takes a single context still exists for simplicity.
  • Added a BFS implementation for multi-step search. The cost of conversion is simply the number of conversion steps. In the future, once conversions start having different costs, we can implement Dijkstra's algorithm instead.
  • Modified FunctionMatcherMixin to work with multi-step conversion.

Additional changes:

  • Fixed bug in CMake macro add_opt_library in header only.
  • Changed template parameters in Reorder and GraphMetric bases to AutoX to indicate users shouldn't touch them.
  • Documented Converter further.
  • Added typedefs in Converter to make code easier to read.
  • Documented IOBase.
  • Rephrased the definition of the inverse permutation generated by reordering.
  • Moved preprocessor parameter structs outside their preprocess classes.
  • Installation was missing installing utils.h
  • Updated tutorial 1 to use bases.
  • Added a page containing a list of available components (formats and functionalities).

@AmroAlJundi AmroAlJundi added priority: now Critical priority state: review needed type: feature Brand new functionality, features, workflows, endpoints, etc labels Oct 17, 2022
@AmroAlJundi AmroAlJundi self-assigned this Oct 17, 2022
@AmroAlJundi AmroAlJundi linked an issue Oct 17, 2022 that may be closed by this pull request
@AmroAlJundi AmroAlJundi added type: fix Iterations on existing features or infrastructure. Optimizations, refactoring, etc. and removed type: feature Brand new functionality, features, workflows, endpoints, etc labels Oct 17, 2022
@AmroAlJundi AmroAlJundi changed the title Multi-step conversion Multi-step conversion + documentation changes Oct 19, 2022
@AmroAlJundi AmroAlJundi changed the title Multi-step conversion + documentation changes Multi-step conversion + documentation changes + bug fixes Oct 20, 2022
@AmroAlJundi AmroAlJundi changed the title Multi-step conversion + documentation changes + bug fixes Multi-step conversion + documentation changes + bug fixes + small code changes Oct 20, 2022
@AmroAlJundi AmroAlJundi added state: pending Taking action type: docs Related to documentation and information type: feature Brand new functionality, features, workflows, endpoints, etc and removed state: review needed labels Oct 22, 2022
@AmroAlJundi AmroAlJundi force-pushed the feature/multi_step_conversion branch 2 times, most recently from acb9635 to 18c8622 Compare October 24, 2022 09:24
```
Every preprocessing algorithm in the library (reordering, partitioning, feature extraction, etc.) can be implemented for many format types. In this case, `RCMReorder`is only implemented for the `CSR` type. However, when we set the `convert_input` parameter (last parameter) to `true` in the call, this will allow function matching to take place.

Function matching takes the input formats to a preprocessing and the contexts the user passes to the function call, and, if the input format can't be used directly for the preprocessing (i.e., no function exists for the input's format type), it attempts to convert the input (using the passed contexts) to a format for which an implementation exists in the preprocessing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specifically mention what conversion via the context means. So in this case the CSR actually gets copied to the CPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

ConversionCondition;

//! A single conversion step composed of a conversion function and a context to use for the function
typedef std::tuple<ConversionFunction, context::Context *> ConversionStep;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this is not an std::pair. Is this where we would add weights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there isn't really a reason. It can be a pair. I opted to associate the cost with an entire chain rather than a step, but now that I think about it it's more general to associate steps with costs. I'll add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That cost is currently not being used, but in the future, it might become useful.

* \return a vector of format with the first being the original format, the last being the target format,
* and the rest being intermediate formats. If a conversion is empty or false, only returns the original format.
*/
static std::vector<format::Format*> ApplyConversionChain(const ConversionChain& chain,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if move conversion is enabled, only the last format of this vector is valid. But if it's not, doesn't it contain duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Technically, all the formats in the chain will be different representations of the same logical Format object. However, if users don't want these formats, they will just get deleted at the next step in the execution (Execute).

I see now that users might want to save memory by deleting a format as soon as it is not needed. Maybe we can replace the move options with such an option? I mean, if someone wants to move convert, they probably don't want the intermediate formats, right? What do you think?

Copy link
Contributor

@ardasener ardasener Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was my thought process as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let me add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so here's what I did:

  • For ApplyConversionChain, I replaced is_move_conversion with clear_intermediate.
    • is_move_conversion wasn't being used to begin with since at that point, the conversion functions were already decided by GetConversionChain.
    • clear_intermediate would delete intermediate formats after each step, keeping only the start format and destination format.
  • Replaced is_move_conversion with clear_intermediate in ApplyConversionSchema for the same reason as above.
  • In FunctionMatcher, added a clear_conversion parameter to CachedExecute. Here is what happens when it's true and when it's false:
    • clear_intermediate == true: if a chain size > 1, all the formats between the first and the last are deleted. It will only return the destination format. (In contrast, Execute doesn't return any formats)
    • clear_intermediate == false: will return all the destination format and all the intermediate formats.
  • In FunctionMatcher in Execute, the call to CachedExecute will always clear intermediate formats. Then inside Execute, the destination formats are also deleted.

Now, I want to retract my statement that "if someone wants to move convert they probably don't want the intermediate formats." That can actually be the case. Imagine move converting CSC->CSR. This will do CSC -> COO -> CSR. However, if move conversions are used, we will end up with three objects but only 6 arrays:
row_ptr (CSR and COO share this)
col_ptr (CSC)
col (CSR and COO share this)
vals (COO and CSR share this)
row (CSC)
vals (CSR and COO share this)
vals_csc (CSC)
(CSC vals and the other two vals are different because of how edges are sorted)

Which is what the "view" idea would've required. So I think clearing intermediate and moving should be separate ideas.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok looks good to me

Copy link
Contributor

@ardasener ardasener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good. If you fix the small nitpicky things I requested and answer my questions I can approve.

@ardasener ardasener added state: revision needed Requires additional work before next review and removed state: review needed labels Oct 26, 2022
@AmroAlJundi AmroAlJundi added state: review needed and removed state: revision needed Requires additional work before next review labels Oct 27, 2022
@AmroAlJundi AmroAlJundi merged commit cd8f202 into develop Oct 27, 2022
@AmroAlJundi AmroAlJundi deleted the feature/multi_step_conversion branch October 27, 2022 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: now Critical priority state: review needed type: docs Related to documentation and information type: feature Brand new functionality, features, workflows, endpoints, etc type: fix Iterations on existing features or infrastructure. Optimizations, refactoring, etc.
Projects
None yet
2 participants