Feature: select+joins analysis with pandas DataFrames by il-ia-ni · Pull Request #4 · il-ia-ni/python_sql_tutorial

il-ia-ni · 2023-02-02T13:26:24Z

Hey Phillip :)

This is my first commit to the branch for testing pandas dataframes with the SQL Alchemy lists of select+join-results.

Right now there seem to be problems with filling DataFrames with data, even though the sqlalchemy lists are correctly parsed to DataFrames. This should have something to do with the detached status of the lists upo closing the Session.

Here are the links to the topic:

Error: Parent instance is not bound to a Session; (lazy load/deferred load/refresh/etc.) operation cannot proceed
State Management of SQL Alchemy query results (only for ORM-instances): https://docs.sqlalchemy.org/en/14/orm/session_state_management.html#session-object-states

…Instances Dataframes get no data as the underlying instances are detached upon closing session within the methods! https://docs.sqlalchemy.org/en/14/errors.html#error-bhk3 https://docs.sqlalchemy.org/en/14/orm/session_state_management.html#session-object-states

main.py

…unt of test_objs https://realpython.com/python-kwargs-and-args/

il-ia-ni · 2023-02-15T14:50:13Z

Hier bin ich mir nicht sicher:

Die Funktion ist ja sicherlich zu Testzwecken gedacht. Aber wir müssen nochmal schauen/recherchieren, ob es Standard ist, in einer Funktion drei mal session.execute() aufzurufen. Und da wir kein session.add_all() verwenden, frage ich mich,ob das session.commit() notwendig ist.

Bin mir auch hierzu nicht so ganz sicher! Da SQL Alchemy in der Dokumentation den "Commit as you go" Weg immer nimmt (so schreiben sie es selber), habe ich es auch in meiner Testmethode übernommen. Sonst muss ich noch rausfinden, wie man diesen .begin() auf eine Session-Instanz anwendet...

Hier ist was in der SQL Alchemy Dokumentation zu dem Unterschied zwischen den "Commit as you go"- und "Begin once"-Commitprinzipien steht:

“Begin once” style is often preferred as it is more succinct and indicates the intention of the entire block up front. However, within this tutorial we will normally use “commit as you go” style as it is more flexible for demonstration purposes.

Package: data_analysis

To the data analysis package Fixed the problem with binding non-scalar tuple as a parameter of a text query. https://stackoverflow.com/a/56382828

il-ia-ni · 2023-02-16T15:06:08Z

Hey @Phillip-Oliver , ich habe heute alle Neuerungen hochgeladen!
Unter anderem sind nun alle meinen Methoden für Erstellung von DataFrames in die Package data_analysis migriert worden.

Ich habe auch deine beiden Skripts (defect_event_root_cause und defect_root_cause_pairs) auch im data_analysis Package überarbeitet und ein Paar interessanten Sachen da rausgefunden.

Auch meine Test Datenbank ist nun komplett dem Schema mit dem falschen Namen losgeworden, alle Tabellen, Prozeduren und Funktionen sind nun unter dem Schema "Main" zu finden: Dies ermöglicht eigentlich, diese PR zu schließen und dann kann ich die nächste PR mit der lokalen fallback-SQLite DB fertig machen!

Also bitte nochmals um deine Kommentare :)

Phillip-Oliver

Stabile Arbeit !
Kannst ja mal über die Kommentare rüber gehen. Aber ich denke, wir können es bald mergen ;)

Phillip-Oliver · 2023-02-23T07:36:48Z

data_analysis/dataframes.py

+def create_df_from_list(rows_data_list: list, columns_list: list) -> pd.DataFrame:
+    """
+    Creates a pandas DataFrame based on a list of selection result Rows instances that were already created using f.e.
+    SQL Alchemy Core API or ORM API
+    :param rows_data_list: a list of Rows instances of a select result
+    :param columns_list: an iterable with the strings of attributes names of the select result
+    :return: a pandas dataframe with the data of the list with the selection result
+    """
+    result_df = pd.DataFrame(rows_data_list, columns=columns_list)
+    logger.debug("A DataFrame with following parameters was created from the list: \n", result_df.info())
+
+    return result_df


Für die Umwandlung einer Liste in einen DF extra eine Funktion ? ;)

Tja danke für den Hinweis 😆 Mir ist während der Verbesserung des Codes irgendwie komplett entlaufen, dass ich hier was anderes machen wollte... Meine Idee war eigentlich noch eine statische Typ-Prüfung einzuführen. Ansonsten macht die Methode gar keinen Sinn.

Ich habe sie jetzt mit Union Type Annotations erweitert, damit nur Rows von SQL Alchemy akzeptiert werden. Hier ist der Commit dazu. Auch den Methodennamen habe ich angepasst.

Theoretisch könnte man weitere Validierungen von Rows vor der Erstellung von DataFrames einführen. Es war auf jeden Fall eine gute Übung zu den Type Annotations in Python 😉

Phillip-Oliver · 2023-02-23T07:41:44Z

data_analysis/defect_root_cause_pairs.py

+create_root_cause_groups_text_clause = text(
+    "WITH A AS ("
+    "   SELECT A.create_date, A.event_id, B.signal_id, A.behaviour_pattern_id, A.strand_id, A.slab_id, B.importance "
+    "   FROM main.defect_event A INNER JOIN main.defect_root_cause B "
+    "       on A.event_id=B.event_id "
+    "   WHERE A.event_id IN ("
+    "       SELECT event_id "
+    "       FROM main.defect_root_cause "
+    "       WHERE signal_id NOT in :signals "
+    "           AND slab_id IS NOT null "
+    "           AND create_date BETWEEN CAST(:start_date AS DATE) AND GETDATE() "
+    "       GROUP BY event_id HAVING COUNT(event_id) > 1 AND COUNT(event_id) < 3)"
+    "   )"
+    "SELECT "
+    "   create_date, "
+    "   event_id, "
+    "   behaviour_pattern_id, "
+    "   strand_id, "
+    "   slab_id, "
+    "   importance,"
+    "   STUFF((SELECT ',' + CAST(signal_id AS varchar(1000)) "
+    "       FROM A AS innerTable "
+    "       WHERE innerTable.event_id = p.event_id "
+    "       FOR XML PATH('')),1,1,'') "
+    "AS signal_pattern "
+    "FROM A AS p "
+    "GROUP BY p.event_id, create_date, behaviour_pattern_id, strand_id, slab_id, importance")
+
+
+def root_cause_pairs_query(date: str, signals: tuple[str, ...]) -> str:
+    # tuple[str] vs tuple[str, ...] see: https://stackoverflow.com/questions/72001132/python-typing-tuplestr-vs-tuplestr
+
+    # f"" is formatted string literal. An overview of Python String formatting: https://realpython.com/python-f-strings/
+    # f""" """ allows writing a multiline string with no escape character \ before each new line


Und was von beidem funktioniert besser ? :)

Auf jeden Fall sind die f-Strings die beste Lösung! 😄
Das einzige, worauf man aufpassen muss ist dass die Strings manchmal von SQL Alchemy in integers umgewandelt werden, auch wenn es eigentlich ein Datum ist... Das habe ich für die Zeile mit der skalaren Variable "date" wie folgt gelöst:
AND create_date BETWEEN CAST(\'{date}' AS DATE) and GETDATE()

Das ist aber immer noch 1000 Mal einfacher, als für text-Konstrukte die ganzen Binding-Methoden für nicht-skalare Variablen benutzen zu müssen... War aber auf jeden Fall der Übung wert 📌

Phillip-Oliver · 2023-02-23T07:43:28Z

data_analysis/defect_root_cause_pairs.py

+    query = root_cause_pairs_query(date=date, signals=signals_to_filter)
+    with session as s:
+        # result = s.execute(text_statement, params={'signals': signals_to_filter, 'start_date': date})
+        result = s.execute(query)


Wäre es vllt. auch möglich einen Filter für caster einzubauen?
Fällt mir gerade so ein !

Die caster wären '2c' oder '1'

Ich habe nun den Filter im extract_root_cause_pairs Skript eingeführt. Hier ist der Commit dazu.

Es gibt jedoch ein ziemlich großes Problem mit den SQLAlchemy Statements in Form der f-Strings: Ein Tuple in Python darf nur dann einen Wert haben, wenn hinter ihm ein Komma steht. Solchen Tuple schafft aber SQL nicht in der "WHERE IN ('1', )"-Anweisung zu erledigen.

Ich habe deswegen TODOs reingeschrieben, da ansonsten man doppelt einen denselben Wert reingeben muss (siehe die Zeile 128) 😱

Phillip-Oliver · 2023-02-23T07:46:09Z

main.py

-global_engine = make_engine(url, 1000)
-global_session = get_session(global_engine)
+debug_format()  # adds a custom format for debug-level of loguru (saving local .log-files)
+global_session = get_session(sqlserver_engine)


Ich glaube diese plotting Sachen würde ich in ein extra modul packen, also nicht in main.py.

Hast du das mal mit Testdaten ausprobiert ob das läuft ?

Yep, ich habe nun das Main.py vom Testcode befreit und ein plots.py Skript für die Matplotlib-basierten Methoden erstellt. Die Methode fürs Gruppieren der Signaldaten von Events habe ich ins dataframes.py Skript verschoben. Hier ist der Kommit dazu.

Es sind nun auch DocString zu jeder Methode drin sowie ein Paar To-Dos 😃

Leider kann ich keine Screenshots mehr auf dem Master Client auf GutHub hochladen... 🙄 Nach dem ich die Datanbank restrukturiert habe, habe ich tatsächlich mit ein Paar klinDensätzten signal_data aus der KundenDB bezogen und Plots erstellt! Es klappt alles gut 🏅

Uses Union Type Annotations now: https://docs.python.org/3.10/whatsnew/3.10.html#pep-604-new-type-union-operator and https://stackoverflow.com/questions/38854282/do-union-types-actually-exist-in-python#38854368

…ing methods

Single value-tuple does not with f-Strings SQL Queries that contain WITH IN!

Ilia Nikolaenko added 17 commits January 26, 2023 16:16

Added comments to the sessionmaker in progress

0f84bb9

Merge branch 'main' into feature-move_sessions_2_sessionmaker

0c18088

upd loguru rotation for custom debug logs

aec7319

Update debug_formatter.py

ba87249

Update debug_formatter.py

c2502cd

Update debug_formatter.py

efa4679

upd simple_select methods to support global session instances

0173402

Upd select_joins methods to support global session instances

6e4b55f

Tried adding a seccond logging formatter for lower-lvl logging

e2bd01f

Refactor adding/deleting new instances logic. added TODOs

98cafda

Update main.py

c78f4b0

Added severity level filet for the custom loguru formatter

91e15cc

revised loguru severities after filtering of format

74ab9d0

Update creating_tables.py

80b98b1

small upds for working with the branch

c7e23c0

unimportant fix

a3f9810

il-ia-ni requested a review from Phillip-Oliver February 2, 2023 13:26

il-ia-ni self-assigned this Feb 2, 2023

Ilia Nikolaenko added 3 commits February 7, 2023 10:18

Added dataframe data extractions

e622076

Update main.py

7c117ee

Upd select_joins with core statements

7cb09bc

il-ia-ni marked this pull request as ready for review February 7, 2023 10:01

Ilia Nikolaenko added 6 commits February 7, 2023 11:26

Extended attrs list for selections with join

41deb8e

upd docstrings

82c5524

upd select_joins orm method

2b43ccb

Merge branch 'main' into feature-joins-analysis-w-pandas

eb6a153

moved testinmg of adding and deleting objs to adding_data.py

ab4c037

restored pandas testing in main

fffdb56

Phillip-Oliver reviewed Feb 9, 2023

View reviewed changes

main.py Outdated Show resolved Hide resolved

Ilia Nikolaenko added 3 commits February 15, 2023 14:35

formatted main.py

e27c15a

Added todo for unit tests for future prs

c6f25e0

Updated adding_data method with the splat operator to support any amo…

478b379

…unt of test_objs https://realpython.com/python-kwargs-and-args/

Ilia Nikolaenko added 4 commits February 16, 2023 11:21

Upd table definitions to the main scheme of the test DB

bd152fa

Added refactored defect_event_root_cause script

0346386

Package: data_analysis

Added own defect_root_cause_pairs script

6f37d02

To the data analysis package Fixed the problem with binding non-scalar tuple as a parameter of a text query. https://stackoverflow.com/a/56382828

Moved test lines to dataframes.py and upd main.py

a0aa9fd

Ilia Nikolaenko added 11 commits February 17, 2023 11:22

Update defect_root_cause_pairs.py

1796d81

Update creating_tables.py

053411d

Update dataSources.xml

be35c6e

Create sshConfigs.xml

67f557a

Update creating_tables.py

04bd7c0

Update dataframes.py

92a5192

Update creating_tables.py

d91009a

Update main.py

115e109

Upd scripts with testing of group creation from jupiter notebook

0a516cb

Added matplotlib to the project

ef1311e

Added testing of plotting func to the main script

6ce6ea9

Phillip-Oliver approved these changes Feb 23, 2023

View reviewed changes

Ilia Nikolaenko added 7 commits February 24, 2023 09:13

Refactored create_df_from_list function

7619049

Uses Union Type Annotations now: https://docs.python.org/3.10/whatsnew/3.10.html#pep-604-new-type-union-operator and https://stackoverflow.com/questions/38854282/do-union-types-actually-exist-in-python#38854368

Created plots script, added docstrings to signals groupping and plott…

544fecc

…ing methods

FIX: Upd parsing JSON signal_data with ORM cls of SQL Alchemy

da4fd43

removed todo

7b921b3

Added caster filter for extract_root_cause_pairs + TODO

27efbdf

Single value-tuple does not with f-Strings SQL Queries that contain WITH IN!

Added TODO to the last code from jupiter notebook

2229db8

cleaned up the main script

2d45645

il-ia-ni merged commit 2d45645 into main Feb 24, 2023

il-ia-ni deleted the feature-joins-analysis-w-pandas branch February 24, 2023 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: select+joins analysis with pandas DataFrames#4

Feature: select+joins analysis with pandas DataFrames#4
il-ia-ni merged 60 commits intomainfrom
feature-joins-analysis-w-pandas

il-ia-ni commented Feb 2, 2023 •

edited

Loading

Uh oh!

Uh oh!

il-ia-ni commented Feb 15, 2023 •

edited

Loading

Uh oh!

il-ia-ni commented Feb 16, 2023

Uh oh!

Phillip-Oliver left a comment

Uh oh!

Phillip-Oliver Feb 23, 2023

Uh oh!

il-ia-ni Feb 24, 2023 •

edited

Loading

Uh oh!

Phillip-Oliver Feb 23, 2023

Uh oh!

il-ia-ni Feb 24, 2023 •

edited

Loading

Uh oh!

Phillip-Oliver Feb 23, 2023

Uh oh!

il-ia-ni Feb 24, 2023

Uh oh!

Phillip-Oliver Feb 23, 2023

Uh oh!

il-ia-ni Feb 24, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

il-ia-ni commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

il-ia-ni commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

il-ia-ni commented Feb 16, 2023

Uh oh!

Phillip-Oliver left a comment

Choose a reason for hiding this comment

Uh oh!

Phillip-Oliver Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

il-ia-ni Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Phillip-Oliver Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

il-ia-ni Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Phillip-Oliver Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

il-ia-ni Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

Phillip-Oliver Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

il-ia-ni Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

il-ia-ni commented Feb 2, 2023 •

edited

Loading

il-ia-ni commented Feb 15, 2023 •

edited

Loading

il-ia-ni Feb 24, 2023 •

edited

Loading

il-ia-ni Feb 24, 2023 •

edited

Loading

il-ia-ni Feb 24, 2023 •

edited

Loading