Performance warning when using pat2feat.get_features #47

jcklie · 2023-06-12T04:38:21Z

Thank you for this wonderful library, it works really well so far. When using pat2feat.get_features to extract features for many patterns, then I get lots of

/sequential/pat2feat.py:79: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df['feature_' + str(i)] = df.apply(lambda row: is_satisfiable_in_rolling(row['sequence'], pattern,

Would it be possible to generate the column data first and then bulk create the dataframe by concatenating, instead of adding them to the dataframe 1 by 1? Would it also be possible to directly return a numpy array by pat2feat, as pandas is often an overkill?

#Sequences: 108
#Patterns: 4300

pandas==2.0.2
seq2pat==1.4.0

The text was updated successfully, but these errors were encountered:

takojunior · 2023-06-12T15:30:29Z

Thank you @jcklie for your interests and feedback to the library! The performance warning seems to be caused by inserting a large number of columns into data frame 1 by 1. It is a good suggestion to potentially speed up the process by doing a bulk concatenation instead of inserting each time. We shall look into this in future library updates.

skadio · 2024-03-25T20:07:19Z

Closing for now to revisit later on.

skadio closed this as completed Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance warning when using pat2feat.get_features #47

Performance warning when using pat2feat.get_features #47

jcklie commented Jun 12, 2023 •

edited

takojunior commented Jun 12, 2023

skadio commented Mar 25, 2024

Performance warning when using pat2feat.get_features #47

Performance warning when using pat2feat.get_features #47

Comments

jcklie commented Jun 12, 2023 • edited

takojunior commented Jun 12, 2023

skadio commented Mar 25, 2024

jcklie commented Jun 12, 2023 •

edited