Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance warning when using pat2feat.get_features #47

Closed
jcklie opened this issue Jun 12, 2023 · 2 comments
Closed

Performance warning when using pat2feat.get_features #47

jcklie opened this issue Jun 12, 2023 · 2 comments

Comments

@jcklie
Copy link

jcklie commented Jun 12, 2023

Thank you for this wonderful library, it works really well so far. When using pat2feat.get_features to extract features for many patterns, then I get lots of

/sequential/pat2feat.py:79: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df['feature_' + str(i)] = df.apply(lambda row: is_satisfiable_in_rolling(row['sequence'], pattern,

Would it be possible to generate the column data first and then bulk create the dataframe by concatenating, instead of adding them to the dataframe 1 by 1? Would it also be possible to directly return a numpy array by pat2feat, as pandas is often an overkill?

#Sequences: 108
#Patterns: 4300

pandas==2.0.2
seq2pat==1.4.0
@takojunior
Copy link
Contributor

Thank you @jcklie for your interests and feedback to the library! The performance warning seems to be caused by inserting a large number of columns into data frame 1 by 1. It is a good suggestion to potentially speed up the process by doing a bulk concatenation instead of inserting each time. We shall look into this in future library updates.

@skadio
Copy link
Contributor

skadio commented Mar 25, 2024

Closing for now to revisit later on.

@skadio skadio closed this as completed Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants