## Formatera data från SCAAR så output kan laddas till DosReg

Gå till https://www.ucr.uu.se/swedeheart/ och logga in med Siths-kort\
Gå till "Rapporter"\
Välj "Export till Excel Angio-PCI"\
Rapportdatum: T.ex 2024-01-01 - 2025-01-01\
Angio/PCI: Enbart angio\
Procedur/Segment: Procedur\
Klicka på "Beställ" och ladda ner resulterande excel-fil.\
Öppna excel-fil, radera första 3 kolumnerna "Personnr eller motsv", "Typ av personnummer" och "Födelsedatum" och spara till input_data.


In [1]:
import pandas as pd

# Läs in data från SCAAR

SCAAR_data_path = "C:/Projekt/GIT/rvbrtg/Data/input_data/Angio_2024.xlsx"
data = pd.read_excel(SCAAR_data_path)

#data.head()

In [2]:
#Minska ner tabellen och döp om kolumner

data_subset = data[["Kön", "Ålder vid procedur", "Längd (cm)", "Vikt (kg)", "Angiograför", "Punktionställe", "Labnamn", "Stråldos (µGym2)", "Genomlysningstid (h:mm:ss)", ]].copy()

data_subset.columns = ["Sex", "Age", "Length_cm", "Weight_kg", "Operator", "Accesspoint", "Lab", "KAP_uGym2", "Fluorotime_h_mm_ss"]

data_subset.head()

Unnamed: 0,Sex,Age,Length_cm,Weight_kg,Operator,Accesspoint,Lab,KAP_uGym2,Fluorotime_h_mm_ss
0,Kvinna,87,160.0,67.0,"Hasslow, Jacob",A radialis höger,Lab 1,42200,00:05:05
1,Man,86,172.0,90.0,"Pettersson, Björn",A radialis vänster,Lab 1,149000,00:05:37
2,Kvinna,86,,,"Andersson, Jonas",A femoralis konverterad från radialis,Lab 1,15200,00:03:05
3,Man,86,171.0,75.0,"Hagström, Henrik",A radialis höger,Lab 1,74900,00:04:03
4,Man,85,,,"Pettersson, Björn",A radialis höger,Lab 1,104000,00:06:55


In [3]:
#Kolla datatyper. Är KAP numeriskt eller en sträng

data_subset.dtypes

Sex                    object
Age                     int64
Length_cm             float64
Weight_kg             float64
Operator               object
Accesspoint            object
Lab                    object
KAP_uGym2              object
Fluorotime_h_mm_ss     object
dtype: object

In [4]:
#Ersätt , med . samt byt från sträng till float och konvertera från uGym2 till Gycm2
data_subset["KAP_uGym2"] = data_subset["KAP_uGym2"].replace(',','.',regex=True).astype(float)
data_subset["KAP_uGym2"] = data_subset["KAP_uGym2"] * 0.01

data_subset.rename(columns = {"KAP_uGym2":"KAP_Gycm2"}, inplace = True)

#data_subset.dtypes
data_subset.head()

Unnamed: 0,Sex,Age,Length_cm,Weight_kg,Operator,Accesspoint,Lab,KAP_Gycm2,Fluorotime_h_mm_ss
0,Kvinna,87,160.0,67.0,"Hasslow, Jacob",A radialis höger,Lab 1,4.22,00:05:05
1,Man,86,172.0,90.0,"Pettersson, Björn",A radialis vänster,Lab 1,14.9,00:05:37
2,Kvinna,86,,,"Andersson, Jonas",A femoralis konverterad från radialis,Lab 1,1.52,00:03:05
3,Man,86,171.0,75.0,"Hagström, Henrik",A radialis höger,Lab 1,7.49,00:04:03
4,Man,85,,,"Pettersson, Björn",A radialis höger,Lab 1,10.4,00:06:55


In [5]:
data_subset_weight = data_subset[(data_subset["Weight_kg"] > 60) & (data_subset["Weight_kg"] < 90)]

data_subset_weight.head()

Unnamed: 0,Sex,Age,Length_cm,Weight_kg,Operator,Accesspoint,Lab,KAP_Gycm2,Fluorotime_h_mm_ss
0,Kvinna,87,160.0,67.0,"Hasslow, Jacob",A radialis höger,Lab 1,4.22,00:05:05
3,Man,86,171.0,75.0,"Hagström, Henrik",A radialis höger,Lab 1,7.49,00:04:03
5,Man,85,178.0,76.0,"Andersson, Jonas",A radialis höger,Lab 1,3.58,00:01:26
6,Kvinna,85,159.0,66.0,"Hagström, Henrik",A radialis vänster,Lab 1,2.39,00:03:42
8,Kvinna,83,165.0,65.0,"Andersson, Jonas",A femoralis konverterad från radialis,Lab 2,2.83,00:02:47


In [6]:
#Printa ut antal per kön och medelvärde för KAP för att skriva in i DosReg-mall

print(data_subset_weight.groupby("Sex").size())

print(data_subset_weight.groupby("Sex").mean(numeric_only = True))

Sex
Kvinna     98
Man       178
dtype: int64
              Age  Length_cm  Weight_kg  KAP_Gycm2
Sex                                               
Kvinna  69.897959  163.30000  72.030612   4.670408
Man     68.932584  176.11976  78.955056   5.834775


In [7]:
# Ensure 'Fluorotime_h_mm_ss' is in string format before converting to timedelta
data_subset_weight["Fluorotime_h_mm_ss"] = data_subset_weight["Fluorotime_h_mm_ss"].astype(str)

# Convert 'Fluorotime_h_mm_ss' to seconds for numerical aggregation
data_subset_weight["Fluorotime_seconds"] = pd.to_timedelta(data_subset_weight["Fluorotime_h_mm_ss"]).dt.total_seconds()

# Group by 'Operator' and calculate mean values for specified columns
mean_values = data_subset_weight.groupby("Lab").median(numeric_only=True)[["Age", "Length_cm", "Weight_kg", "KAP_Gycm2"]]

# Add mean of 'Fluorotime_seconds' to the grouped data
mean_values["Fluorotime_seconds"] = data_subset_weight.groupby("Lab")["Fluorotime_seconds"].mean()

# Convert 'Fluorotime_seconds' back to h:mm:ss format
mean_values["Fluorotime_h_mm_ss"] = pd.to_timedelta(mean_values["Fluorotime_seconds"], unit='s')

# Drop the intermediate 'Fluorotime_seconds' column
mean_values = mean_values.drop(columns=["Fluorotime_seconds"])

# Display the result
print(mean_values)

        Age  Length_cm  Weight_kg  KAP_Gycm2        Fluorotime_h_mm_ss
Lab                                                                   
Lab 1  70.0      173.0       78.0       4.22 0 days 00:03:44.502645503
Lab 2  72.0      170.0       75.0       4.20 0 days 00:03:51.724137931


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_subset_weight["Fluorotime_h_mm_ss"] = data_subset_weight["Fluorotime_h_mm_ss"].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_subset_weight["Fluorotime_seconds"] = pd.to_timedelta(data_subset_weight["Fluorotime_h_mm_ss"]).dt.total_seconds()
