# Possible pitfalls

In [None]:
using DataFrames
using BenchmarkTools

## Know what is copied when creating a DataFrame

In [None]:
x = DataFrame(rand(3, 5), :auto)

x and y are not the same object

In [None]:
y = copy(x)
x === y

x and y are not the same object

In [None]:
y = DataFrame(x)
x === y

the columns are also not the same

In [None]:
any(x[!, i] === y[!, i] for i in ncol(x))

x and y are not the same object

In [None]:
y = DataFrame(x, copycols=false)
x === y

But the columns are the same

In [None]:
all(x[!, i] === y[!, i] for i in ncol(x))

the same when creating data frames using `kwarg` syntax

In [None]:
x = 1:3;
y = [1, 2, 3];
df = DataFrame(x=x, y=y)

different object

In [None]:
y === df.y

range is converted to a vector

In [None]:
typeof(x), typeof(df.x)

slicing rows always creates a copy

In [None]:
y === df[:, :y]

you can avoid copying by using copycols=false keyword argument in functions.

In [None]:
df = DataFrame(x=x, y=y, copycols=false)

now it is the same

In [None]:
y === df.y

not the same object

In [None]:
select(df, :y)[!, 1] === y

the same object

In [None]:
select(df, :y, copycols=false)[!, 1] === y

## Do not modify the parent of `GroupedDataFrame` or view

In [None]:
x = DataFrame(id=repeat([1, 2], outer=3), x=1:6)
g = groupby(x, :id)

x[1:3, 1] = [2, 2, 2]
g ## well - it is wrong now, g is only a view

In [None]:
s = view(x, 5:6, :)

In [None]:
delete!(x, 3:6)

This is an error

```julia
s ## Will return BoundsError
```

## Single column selection for a `DataFrame`
Single column selection for a `DataFrame` creates aliases with ! and `getproperty` syntax and copies with :

In [None]:
x = DataFrame(a=1:3)
x.b = x[!, 1] ## alias
x.c = x[:, 1] ## copy
x.d = x[!, 1][:] ## copy
x.e = copy(x[!, 1]) ## explicit copy
display(x)

In [None]:
x[1, 1] = 100
display(x)

## When iterating rows of a data frame

- use `eachrow` to avoid compilation cost in wide tables,
- but `Tables.namedtupleiterator` for fast execution in tall tables

The table below is tall:

In [None]:
df2 = DataFrame(rand(10^6, 10), :auto)

In [None]:
@time map(sum, eachrow(df2));

In [None]:
@time map(sum, eachrow(df2));

In [None]:
@time map(sum, Tables.namedtupleiterator(df2));

In [None]:
@time map(sum, Tables.namedtupleiterator(df2));

as you can see - this time it is much faster to iterate a type stable container
still you might want to use the `select` syntax, which is optimized for such reductions:

this includes compilation time

In [None]:
@time select(df2, AsTable(:) => ByRow(sum) => "sum").sum

Do it again

In [None]:
@time select(df2, AsTable(:) => ByRow(sum) => "sum").sum

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*